Method of Determining Disease Causality of Genome Mutations

ABSTRACT

A method of identifying a gene or genomic mutation that is linked to causality of a neuropsychiatric disorder is provided. The method comprises identifying exons which exhibit an expression level that is at least within the 75 th  percentile of exon expression levels within a nucleic acid-containing sample from a mammal having a neuropsychiatric disorder; comparing the sequence of each identified exon to the sequence of a corresponding exon from a healthy control to identify rare or de novo sequence mutations within the identified exon; calculating the burden of rare or de novo mutations within the exon; and determining the correlation between expression level of the identified exon and burden of de novo or rare mutations in the exon, wherein an inverse correlation indicates that the exon gene is linked to causality of the neuropsychiatric disorder.

FIELD OF THE INVENTION

The present invention relates generally to methods of identifying thesignificance of gene mutations, and more particularly, to a method ofdetermining the disease causality associated with a genomic mutation.

BACKGROUND OF THE INVENTION

Rare genomic variants are being revealed as preponderant genetic riskfactors for neuropsychiatric conditions such as Autism Spectrum Disorder(ASD). ASD exhibits extensive clinical and genetic heterogeneity withhigh heritability (37-90%), for ˜20% of affected individuals, anunderlying genetic alteration is attributed as the cause. Chromosomalabnormalities, copy number variations (CNVs), smallerinsertion/deletions (indels), single nucleotide variations (SNVs) andcombinations thereof, have been described in ASD. Indeed, >100 ASDsusceptibility genes of variable (and still largely undetermined)penetrance and expressivity are known, with several hundred othersestimated to exist. A similar genetic architecture is also starting toemerge in other brain disorders such as schizophrenia (SZ), intellectualdisability (ID) and epilepsy, and some genes are implicated across thesedisorders.

Proving causality of a particular DNA sequence variant(s) in a complexdisorder such as ASD can be complicated by the exceptional rarity ofmutations involved (often unique to individuals or families), and therate of new mutation, which is often equal in controls. For example, therate of de novo SNV-indels is similar (˜1 per exome) betweenASD-affected and -unaffected individuals (population controls orunaffected siblings). Regarding CNVs, a slightly increased burden of denovo events has been observed in ASD cases compared to family andpopulation controls, but these deletions and duplications usually affectmultiple genes complicating attempts in simple genotype-phenotypecorrelations. Through genotyping massive cohorts, there has been someprogress identifying critical exons for clinical manifestation of ASD inthe NRXN, NRXN3 and UTS2 genes. However, a more robust approach isneeded to find those rare etiologic alterations amongst thousands ofother variants generated in genome scanning experiments.

SUMMARY OF THE INVENTION

It has now been determined that exons of genes harboring deleteriousdisease-related mutations were found to be significantly enriched and toexhibit an inverse relationship with burden of rare or de novo mutationin individuals having a disease condition such as a neuropsychiatricdisorder.

Accordingly, in aspects of the present invention, a method ofidentifying one or more genes or genome sequence mutations linked tocausality of disease is provided. The method comprises the steps of:

-   -   i) identifying exons which exhibit an expression level that is        greater than the 75^(th) percentile of exon expression levels in        nucleic acid from one or more individuals having a disease;    -   ii) comparing the sequence of each identified exon to the        sequence of a corresponding exon from a healthy control to        identify rare or de novo sequence mutations within the        identified exon;    -   iii) calculating the burden of rare or de novo mutations within        the exon; and    -   iv) determining the correlation between expression level of the        identified exon and burden of de novo or rare mutations in the        exon, wherein an inverse correlation indicates that the exon        gene and at least one of said sequence mutations is linked to        causality of the disease.

A computer system for conducting a method to determine genes linked tocausality of a disease, as well as a computer program product,comprising a tangible computer-readable medium carrying computer-usableinstructions which, when executed by a processing unit of a computer,cause the processing unit to identify one or more genes linked tocausality of a disease using the above method, are also provided.

In another aspect, a method of diagnosing a disease in a mammal isprovided comprising the steps of:

-   -   i) identifying, using a processing device, mutant exons in a        nucleic acid sample of the mammal comprising rare or de novo        sequence mutations as compared to the sequence of corresponding        exons from one or more healthy controls;    -   ii) comparing, using a processing device, mutant exons to        critical exons associated with the disease identified using a        method as defined in claim 1 to determine if the mutant exons        are critical exons; and    -   iii) optionally, if a mutant exon is determined to be a critical        exon, then tissue type and protein network of the mutant exon is        determined and compared to that of the critical exon, wherein        the mammal is determined to have the disease when the mutant        exon has the same tissue type and protein network of the        critical exon.

A computer system for conducting a method of disease diagnosis is alsoprovided, as well as a computer program product, comprising a tangiblecomputer-readable medium carrying computer-usable instructions which,when executed by a processing unit of a computer, cause the processingunit to diagnose disease using the above method.

These and other aspects of the invention will be described in thedetailed description by reference to the following figures.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1. Assembling an exon transcriptome/mutation contingency index. Acomposite gene model was generated and used to integrate exon expressionand rare missense mutation levels in controls. As a result an ‘exontranscriptome/mutation contingency index’ was generated. A ‘criticalexon’ (d) was defined within the contingency index as one beinghighly-expressed relative to others (a, b, d), and at the same timebeing suppressed for the accumulation missense mutations in populationcontrols (based on 75^(th) percentile thresholds from the data). Usingthis transcriptome/mutation contingency index, enrichment analyses wereperformed for genes implicated in ASD and other disorders. The entireset of ‘critical exons’ (3,955 in 1,744 genes) were further analyzed forfunctional enrichment and disease association testing.

FIG. 2. Principal component analysis (PCA) to capture expressionvariability between brain samples. (A) Heatmap from principal componentanalysis showing temporal sample clustering captured by component 1(PC1) loadings. The color gradient from red to blue represent theexpression profile variation captured by the loadings. (B) Component 2(PC2) loadings captures the variation within the prenatal (lateprenatal) to early childhood samples.

FIG. 3. Inverse correlation between burden of rare missense mutationsand exon expression level. (A) Integration of exon expression level andrare missense mutation burden for SCN2A. (B) Genome-wide associationbetween rare missense mutation burden and exon expression level in 11human tissues (each in triplicate). (C) Exon expression level in 196postmortem brain samples (controls) (from 13 male and female donors) inthree developmental periods (prenatal, early childhood and adult) for 16brain regions (11 neocortex regions (NCX), cerebellar cortex (CBC),hippocampus (HIP), Amygdala (AMY), mediodorsal nucleus of thalamus (MD),striatum (STR)). The spatio-temporal correlation coefficient shown inthree panels was computed for genes reported to have de novo mutations(SNVs-indels) in ASD cases or unaffected siblings, ID and SZ. The lastpanel shows aggregate spatio-temporal correlation coefficient (averageexpression from samples for a developmental period) for genesascertained in unaffected ASD siblings or in any of three diseasesubject groups.

FIG. 4. Pearson correlation co-efficient between burden of rare missenseand prenatal brain expression for known ASD genes SHANK1, SHANK2, FMR1 ,AFF2(FMR2), and SCN2A. For these genes multiple independent ASD studiesreported deletions (red square), duplications (blue square), pointmutations (black triangle) or functional evidence from mouse (M)knockout studies.

FIG. 5. Clustering of expression data. (A) The hierarchical clusteringdendogram of replicates (A, B, C) for tissue types. The expressionclustering for cerebellum and testis shows unique variance compared tothe other tissues. (B) Principal component 1 (PC1) and 2 (PC2) plotted(for all replicates) for the tissue types capturing variance from theircorresponding expression profile.

FIG. 6. Distribution of conservation scores for gene exons reported tohave de novo SNV/indel mutations in probands or siblings. phastCons andphyloP shows similar conservation patterns for both probands andsiblings (mean PhastCons=0.566 and 0.563; phyloP=0.62 and 0.56,respectively). (A) PhastCons score histograms showing computed scoresfor exons of the gene reported to have de novo mutations in probands orsiblings. (B) The distribution of phyloP showing minimum, maximum andaverage phyloP scores computed for each exon.

FIG. 7. Distribution of rare missense mutations and age of the mutationsfor genes ascertained through probands or siblings. (A) Distribution ofrare missense mutation burden for genes that harbor de novo mutations inprobands and siblings (mean 0.0096 and 0.01, respectively). (B) Alldetected mutations (common and rare) reported in the exome sequencingdatabase for the genes ascertained in probands or siblings, and mutationage (computed applying coalescent model (9)). (C) Only the rare missensemutations (<0.05 frequency) for genes ascertained through probands orsiblings; these potentially deleterious mutations arose within the last5,000 years.

FIG. 8. Splicing code analysis of de novo SNVs. (A) The CDFs ofcode-predicted Atlfs for probands and unaffected siblings de novomutations. Overall proband mutations do not have larger predicted ΔΨs,and the K-S test does not reject the null hypothesis. (B) The ROC curvefor discriminating probands from controls using code-predicted ΔΨs; areaunder ROC curve (AUC) is nearly 0.5.

FIG. 9. Highly expressed exons with de novo mutations. A) percentages ofbrain tissues that have high expression (>75th percentile) of the exonscomprising de novo mutations that were ascertained in cases andunaffected siblings. B) box plots show the percentages of de novomutations (for cases and unaffected siblings) that can be classified ascritical exons (left) or tolerated exons (right) using all transcriptomesamples. Each confidence interval represents the fraction of variabilityin the number of critical or tolerated exons obtained from differenttranscriptome samples (interquartile range (IQR) is between the firstand third quartiles). C) the burden of rare missense mutations andexpression in human developing brain of the SLC6A1 gene, for whichmutated exons have been reported in ASD cases. To capture the exoninclusion/exclusion event for the mutated exons in prenatal brain andadult brain, RT-PCR assays were designed considering variouscombinations of the flanking exons. Exclusion events (defined by a linefor an exon) were confirmed by Sanger sequencing (Table 5). Partialexclusion of exon 3 of the SLC6A1 gene was also captured owing to thevariable length of this exon in different isoforms.

FIG. 10. Validation of RNA-seq for de novo proband and sibling gene.Spatio-temporal association analysis where −log(P) value is representedby color intensities and the odds ratio (OR, 95% CI) on y-axis for genesreported to have de novo mutations in ASD probands (upward bar graph)and unaffected siblings (inverted bar graph) within three developmentaltime periods: (A) Prenatal, (B) early childhood and (C) adult samples.Similar to microarray data (FIG. 2A), we have observed strongestassociation within neocortex prenatal samples (Bonferroni corrected,p<7.8×10⁻²⁸; OR=2.142).

FIG. 10. Classifying missense/LOF exons with de novo mutations in ASDproband and unaffected siblings: (D) box plots show the percentages ofde novo mutations (for cases and unaffected siblings) that can beclassified as critical exons (left) or tolerated exons (right) using alltranscriptome samples. Each confidence interval represents the fractionof variability in the number of critical or tolerated exons obtainedfrom different transcriptome samples (interquartile range (IQR) isbetween the first and third quartiles). (E) Applying 75th percentilecutoff on the transcriptome/mutation contingency index, we have computedthe fraction of LOF that can be classified as ‘critical’ and ‘tolerated’exons using all transcriptome samples. The confidence intervalrepresents the fraction variability of ‘critical’ or ‘tolerated’ exonsobtained from different transcriptome samples.

FIG. 11. Exon skipping event for a loss of function (LOF) mutationreported in an unaffected sibling. Exon 17 (highlighted red) reported insibling gene (EPB41L3) show frequent exclusion mostly in prenatal andadult brains. The exclusion event was confirmed by Sanger sequencing.

FIG. 12. Association between burden of rare missense mutations andexpression (using 11 distinct human tissues, each with 3 replicates) forgenes for nervous system disorders (per Online Mendelian Inheritance inMan (OMIM)). The phenotypes are classified (in OMIM) according to modeof inheritance (autosomal dominant (A)/recessive (B)). FET significancefor each tissue replicate plotted where the −log(P) on left y-axis andodds ratio (OR) on right y-axis. Cerebellum replicates show thestrongest association signals only for dominant nervous system disordergenes.

FIG. 13. The association between burden of rare missense mutations andexpression in various prenatal (A), early childhood (B) and adult (C)brain tissues for fragile X mental retardation protein targets (FMRP)and forkhead box P2 (FOXP2) targets. The plot shows significantassociation (−log(P)—Fisher exact test in y axis; odds ratio (OR)—inx-axis) for FMRP targets (color circles) but not for FOXP2 targets thatshow evidence of positive selection (grey circles). Size of each circlerepresents the computed odds ratio.

FIG. 14. Association of exon transcriptome and rare missense mutationburden using spatio-temporal expression for genes within de novo CNVsascertained in ASD probands or controls. De novo losses/gainscomprehensively ascertained in four earlier studies of ASD (AGP (stage 1and 2), SSC1 and SSC2) were assayed in data from 16 brain regions ofpostmortem donors from three developmental periods (prenatal (P), earlychildhood (EC), and adult (AD)). The analysis plotted for 196 developinghuman brain samples (P-prenatal, E-early childhood, and A-adult)includes 11 neocortex (NCX) regions (FIG. 4A), (A) amygdala (AMY), (B)cerebellar cortex (CBC), (C) hippocampus (HIP), (D) mediodorsal nucleusof the thalamus (MD) and (E) striatum (STR).

FIG. 15. Delineation and profiling of brain specific critical exons. (A)Association analysis of spatio-temporal expression for genes within denovo CNVs ascertained in four earlier studies of ASD (AGP (stage 1 and2), SSC1 and SSC2) were assayed in data from 16 brain regions ofpostmortem donors from three developmental periods (prenatal (P), earlychildhood (EC), and adult (AD)). The plot shows spatio-temporalassociation results between exon expression level (11 neocortex regions)and burden of rare missense mutations for genes within de novo CNVsreported in ASD Probands and in controls (see FIG. 14 for amygdala,cerebellar cortex, hippocampus, mediodorsal nucleus of thalamus, andstriatum transcriptome association results and RNA-Seq validation). (B)Brain specific ‘critical exon’ enrichment analysis on FMRP targets, PSP,ASD (autosomal dominant, X-linked (XL)) and genes with de novo mutationsrepresented by the vertical line and the p-values (FET; BenjaminiHochberg corrected) relative to the two corresponding backgrounds(entire RefSeq exons set (background 1) and highly brain expressed exons(background 2)). The empirical p-value was obtained from the permutationtest for each background (10,000 random draw of equal number of exons).(C) A functional map of brain specific ‘critical exons’ by mappingenrichment significance as a network of gene sets (nodes), where edgesrepresent mutual overlap between gene sets. The size of the node isproportional to the number of genes in each gene set, and the node colorrepresents the significance (p-value) of overlap. Nodes that arefunctionally interconnected are similarly circled (implicated inprevious ASD networks) into one major group and labeled with theircorresponding function.

FIG. 16. The RNA-seq validation for genes within de novo CNVsascertained in ASD probands or controls. Applying 75^(th) percentilecutoff (similar to microarray), the association significance (FET; lefty-axis −log(P)) for three developmental periods (prenatal (P), earlychildhood (EC), and adult (AD)) on 196 human brain samples from (A)amygdala (AMY), (B) cerebellar cortex (CBC), (C) hippocampus (HIP), (D)mediodorsal nucleus of the thalamus (MD) and (E) striatum (STR). Barsillustrate −log(P) for genes within rare CNVs ascertained in AGP stage 1and 2 controls. Similar to microarray, from RNA-seq analysis we observedthe strongest association in neocortex from prenatal samples (FET,Bonferroni corrected, p<3.13×10⁻¹¹; OR=1.45).

FIG. 17. de novo/rare single genic losses reported in AGP stage 1 and 2and their association analysis. Spatio-temporal microarray analysisusing human developmental brain for de novo/rare (<0.01 frequency)single gene losses (impacting coding sequence) exclusively identified inprobands or controls. Significance with higher odds ratio (OR) wasobserved within genes ascertained for probands in prental neocortexsamples (A) amygdala (AMY), (B) cerebellar cortex (CBC), (C) hippocampus(HIP), (D) mediodorsal nucleus of the thalamus (MD) and (E) striatum(STR).

FIG. 18. Tissue-specific ‘critical exons’. (A) Relative proportion oftissue-specific critical exons detected from 11 tissues. (B) Exondensity distribution for genes containing ‘brain-critical exons’. (C)Genes were scored based on the number of exons relative to mean numberof tissue specific critical exons in the list. The plot shows exonicscores for the genes reported in—ASD (AD and X-linked), de novo SNVs,FMRP targets, and PSP.

FIG. 19. Overlap of ‘brain-critical exons’ within the 16p11.2 deletionlocus. (A) Overlapping ‘brain-critical exons’ for genes within the16p11.2 locus and their corresponding exon score is plotted. Thehorizontal bars represent the recurrent breakpoint reported in DECIPHERdatabase, the narrow critical region captured in a multiplex family withsimilar phenotypes. (B) Relative expression (2{circumflex over( )}(−dCt)) from quantitative real-time PCR (qRT-PCR) relative to ACTB(replicated with another housekeeping gene MED13) shows brain (adultcerebellum) specific higher expression of the identified ‘brain-criticalexons’ in SEZ6L2, ASPDH1 , KCTD13, and PRRT2 within 11 differenttissues, consistent with microarray expression intensities.

FIG. 20. Genes from 3q29 and 17q21.31 loci containing ‘brain-criticalexons’, and their expression profile from multiple tissues. (A) Within3q29, recurrent micro-duplication/deletions are associated withsyndromes. Genes containing critical exons from this locus are: discslarge homolog 1 (DLG1) and phosphatidylinositol glycan anchorbiosynthesis, class Z (PIGZ). DLG1 is an ASD candidate gene within thelocus and independent reports are emerging to establish DLG1 as aprimary candidate gene for 3q29 micro duplication/deletion syndrome.There is also emerging evidence for the potential role of DLG1-4 incognition and psychiatric disorder from mouse knockouts. A recent studyon PIGZ knockout mouse experiment showed a significant increase ofamyloid-β (Aβ) peptide (a molecule with pathological role forAlzheimer's disease). (B) Multiple brain specific critical exons (5exons) within corticotropin-releasing hormone receptor 1 (CRHR1) geneand microtubule-associated protein tau (MAPT) gene were detected within17q21.31 deletion locus. Multiple lines of evidence suggest the role ofcorticotropin-releasing hormone receptor 1 (CRHR1) on neuropsychiatricdisorders. MAPT is also a well-studied candidate gene forneuropsychiatric disorders.

FIG. 21. Genes for neurofibromatosis 1 (NF1) microdeletion syndrome and22q11.21 loci containing ‘brain-critical exons’ and their expressionfrom multiple tissues. (A) The NF1 micro-deletion syndrome involves twoprimary breakpoints with variable phenotypes. Both breakpoints affectcoding sequence of NF1 and RAB11 family interacting protein 4 (RAB11FIP4) genes and the cases also have intellectual disabilty as a commonphenotypic manifestation. NF1 is reported to be the primary candidategene in the locus by multiple studies and in addition RAB11FIP4 shown tobe a candidate gene from gene expression analysis. (B) Within the22q11.21 locus, four genes (armadillo repeat gene deleted invelocardiofacial syndrome (ARVCF), DGCR8 microprocessor complex subunit(DGCR8), zinc finger, DHHC-type containing 8 (ZDHHC8) andphosphatidylinositol 4-kinase, catalytic, alpha (PI4KA)) were found tohave ‘brain-critical exons’. All four genes within these loci implicatedpreviously in multiple neuropsychiatric phenotype studies. ARVCF, DGCR8,ZDHHC8 are the three well documented candidate genes.

FIG. 22 illustrates critical-exon enrichment analysis of prenatalpathogenic (A), VOUS (B) and control samples from individuals withdevelopmental delay.

FIG. 23 illustrates a schematic of the analysis framework usinggenome-wide human (prenatal and adult) brain transcriptome (RNAsequencing) and proteome (Fourier-transform mass spectrometry) data toidentify candidate genes from copy number variation in developmentaldelay.

FIG. 24. Ascertainment of pathogenic variants or variants of uncertainsignificance (VOUS) in 10,619 developmental delay cases. The rare CNVsof 30 kb to 5 Mb were classified as pathogenic or variant of uncertainsignificance (VOUS). (A) Bars indicate the total number of CNVs in eachclassification (deletion in red and duplication in blue). Relative toall samples assayed, 4.19% and 1.81% of all samples tested carry apathogenic deletion and duplication, respectively; whereas, 18.28% and31.97% of samples carry a VOUS deletion and duplication, respectively.The green line represents the number of unique genes impacted by thecorresponding variants. (B) The percentage of male and female cases inthe cohort impacted by pathogenic deletion variants. P value shown isfor the one-sided Fisher's exact test.

FIG. 25. The fraction of critical exons (over all exons) computed fromhuman prenatal brain regions for the genes impacted by pathogenic, VOUSand rare control deletion and duplication variants. The critical exonfraction was computed using gene expression level quantified from RNAsequencing in 392 brain tissues (controls) from 31 postmortem donors in2 developmental periods (prenatal and adult) for 16 brain regions (AMY,amygdaloid complex; CBC, cerebellar cortex; V1C, primary visual cortex;STC, posterior (caudal) superior temporal cortex; IPC, posteriorinferior parietal cortex; A1C, primary auditory cortex; S1C, primarysomatosensory cortex; M1C, primary motor cortex; STR, striatum; DFC,dorsolateral prefrontal cortex; MFC, medial prefrontal cortex; VFC,ventrolateral prefrontal cortex; OFC, orbital frontal cortex; MD,mediodorsal nucleus of thalamus; ITC, inferolateral temporal cortex;HIP, hippocampus). The critical exon fraction computed using prenatalbrain transcriptome is shown for the genes impacted by pathogenic (reddots) and VOUS (orange dots) (A) deletions and (B) duplications incomparison to genes impacted by rare control deletions (gray dots).

FIG. 26. Genome-wide protein co-expression and enrichment of genesascertained from pathogenic and VOUS CNVs using human prenatal and adulttissues. (A) The protein modules generated by weighted gene coexpressionnetwork analysis (WGCNA) using high resolution genome-wideFourier-transform mass spectrometry data from 30 histologically normalhuman samples (prenatal and adult). Each coloured bar (dispersed)represents a module. (B) For the blue module, the 20 most significantresults from quantitative association with 18,826 gene sets. (C)Representation of the ‘blue’ module as a functional network, where eachnode is a gene; the edge between genes represents the weighted Pearsondistance. Red nodes represent genes ascertained through CNVs in thedevelopmental delay cohort. (D) The top (25^(th) percentile) criticalexon genes in the genome ascertained using prenatal (green) and adultbrain (purple) transcriptomes, and their corresponding quantifiedenrichment in the protein module, through Fisher's exact test (FET) and100,000 permutations. The orange bar represents the original observationof overlaps between blue module and a gene set. Similarly, (E) and (F)show the enrichment of genes impacted by pathogenic and VOUSduplications (blue) and deletions (red) within the protein module.

FIG. 27. Deletions within the GIT1 gene identified in developmentaldisorder cases or controls. (A) The breakpoints of 12 VOUS deletions(red) and duplications (blue) impacting GIT1 and nearby genes. Thedataset includes 5 de novo deletions/duplications (denoted as DN)reported from developmental cases; no rare CNV was found in controls.(B) The analysis of the human protein co-expression network revealedthat GIT1 is within the blue module and is highly connected (1-degreeneighbors) with genes enriched for ‘critical exons’ (red nodes) andputative ASD genes reported to have de novo mutations (red node withblack outline). (C) Expression of GIT1 (primer targeting critical exons)from quantitative real-time PCR (qRT-PCR) relative to housekeeping gene,MED13 (replicated with another housekeeping gene ACTB) in 11 differenttissues.

FIG. 28. Deletions within MVB12B gene identified in developmentaldisorder cases and controls. (A) The breakpoints of 18 VOUS deletions(red) and duplications (blue) impacting MVB12B and nearby genes. Thebreakpoints include 12 de novo VOUS reported from developmental delaycases, including 3 recurrent breakpoints (6D-DN, 4G-DN, 3G). All de novoVOUS impact the smaller isoform (highlighted by dash line) of the gene(NM_001011703.2) which is not impacted by CNV in controls. (B) The humanprotein co-expression network revealed that the MBV12B gene is thewithin the blue protein module and enriched for ‘critical exons’ (rednodes) and putative ASD genes reported to have de novo mutations (rednode with black outline). (C) Expression of MVB12B (primer targetingcritical exons) from quantitative real-time PCR (qRT-PCR) relative tohousekeeping gene, MED13 (replicated with another housekeeping geneACTB) in 11 different tissues.

FIG. 29 is a block diagram illustrating a processing system suitable foruse in aspects of the invention.

FIG. 30A is a block diagram illustrating a process for identifying oneor more genes linked to causality of a disease, according to anembodiment of the invention.

FIG. 30B is a block diagram illustrating a process for identifyingdisease in a disease cohort, according to an embodiment of theinvention.

DETAILED DESCRIPTION OF THE INVENTION

A method of identifying a gene or genomic mutation that is linked tocausality of a disease is provided. The method comprises identifyingexons which exhibit an expression level that is greater than the 75^(th)percentile (third quartile) of exon expression levels in nucleic acidfrom one or more individuals having the disease; comparing the sequenceof each identified exon to the sequence of a corresponding exon from ahealthy control to identify rare or de novo sequence mutations withinthe identified exon; calculating the burden of rare or de novo mutationswithin the exon; and determining the correlation between expressionlevel of the identified exon and burden of de novo or rare mutations inthe exon, wherein an inverse correlation indicates that the exon gene islinked to causality of the disease.

The term “disease” is used herein to refer to a pathological conditionin a mammal, including human and non-human mammals. The disease may, forexample, be a pathological condition in a developmental process of thecentral nervous system and brain, heart, kidney, or other organs.Examples of disease conditions thus include, but are not limited to,neurological disease such as neuropathies and neuropsychiatricdisorders, cardiomyopathy including congenital heart disease, cancers,diabetes, and developmental disorders.

The term “neuropsychiatric disorder” is used herein to encompass AutismSpectrum Disorder (ASD), i.e. a disorder that results in developmentaldelay of an individual such as autism, Asperger' s Disorder, ChildhoodDisintegrative Disorder, Pervasive Developmental Disorder-Not OtherwiseSpecified (PDD-NOS) and Rett Syndrome (APA DSM-IV 2000); schizophrenia(SZ), intellectual disability (ID), Fragile X syndrome, epilepsy andrelated nervous system disorders such as OMIM (Online MendelianInheritance in Man) nervous system disorders.

In the present method, nucleic acid from one or more individuals, e.g.humans, having a disease of interest, a nucleic acid sample set, isstudied and compared to corresponding nucleic acid control samples, i.e.nucleic acid samples from healthy individuals that do not have thedisease. Nucleic acid from various biological samples may be studied,depending on the disease of interest, for example, samples fromcerebellum, breast, heart, liver, muscle, kidney, thyroid, pancreas,prostate, spleen, and testis. For neurological disease, includingneuropsychiatric disorders, preferred biological samples include brainsamples, for example, neocortex, amygdala, cerebellar cortex,hippocampus, mediodorsal nucleus of the thalamus, and striatum. Forcardiomyopathy, nucleic acid from the heart and related tissue ispreferred. The samples may be obtained from any age group, includingprenatal, child and adult samples.

The method comprises identifying exons within nucleic acid from one ormore individuals (that have a selected disease, e.g. a neuropsychiatricdisorder) that exhibit a high expression level relative to theexpression of other exons within the sample, i.e. a high expressionlevel is considered to be a level that is greater than the 75^(th)percentile of exon expression levels of all exons within the nucleicacid sample or data set. Such exon identification is conducted usingmethods well-established in the art, and will employ automatedcomputerized methods.

Rare or de novo sequence mutations within exons exhibiting a highexpression level are then identified by sequence comparison withcorresponding control exon sequence, e.g. the nucleic acid sequence of acorresponding exon from a healthy control mammal that does not have thedisease of interest. Such sequence comparison is conducted using methodswell-established in the art, and will employ automated computer sequencecomparison. Sequence mutations may include missense, frameshift,nonsense, splice-site variants and copy number variations (CNVs). “Rare”mutations are those which occur at a frequency of less than 0.05. Amutation not possessed by either parent of a sample being analyzed isconsidered to be a “de novo” mutation.

Once rare and de novo mutations within a given exon are identified, theburden of rare and de novo mutations within the exon is calculated.Mutation burden is defined as the mutation count, e.g. of rare and denovo mutations, divided by the length of the exon. Low mutation burdenis considered to be a mutation burden that is less than the 75^(th)percentile of a given sample or dataset.

The correlation between expression level of an exon that exhibits a highexpression level, i.e. an expression level greater than the 75^(th)percentile of exon expression levels in the sample, and burden of rareand de novo mutations within the exon is then calculated. An inversecorrelation between exon expression level and mutation burden indicatesthat at least one of the rare or de novo mutations is linked tocausality of the disease. It also provides an indication that the genein which the subject exon exists is linked to causality of the disease.Further analysis may be conducted to determine the mutation that islinked to causality of the disease. Such calculations (mutation burdenand correlation) and analyses will employ automated computerizedmethods.

In one embodiment, to determine a list of candidate genes for analysis,e.g. candidate genes related to a given disease or condition, a geneco-expression analysis may be conducted based on the proteome andenriched for genes encoding proteins relevant to the disease, forexample, a gene co-expression analysis of genes related to developmentaldelay may be conducted in the determination of causality of aneuropsychiatric disorder. Genes which are identified to exhibitco-expression above a threshold expression, and which exhibit an inversecorrelation between expression level and mutation burden are linked tocausality of a given disease, such as a neuropsychiatric disorder. Inthis regard, a gene was deemed to be significant with respect tocausality of a neuropsychiatric disorder if its critical exon fractionfell within the genome's top 25^(th) percentile for at least 50% of thebiological samples.

Various methods have been developed for constructing gene co-expressionnetworks. In principle, each are based on a two step approach:calculating co-expression measure, and selecting significance threshold.In the first step, a co-expression measure is selected and a similarityscore is calculated for each pair of genes using this measure. Then, athreshold is determined and gene pairs which have a similarity scorehigher than the selected threshold are considered to have a significantco-expression relationship. Co-expression measures for constructing geneco-expression networks may include Pearson's correlation coefficient,Mutual Information, Spearman's rank correlation coefficient andEuclidean distance. A Weighted Gene Co-expression Network Analysis(WGCNA) may also be conducted which selects the threshold forconstructing the network based on the scale-free topology of geneco-expression networks. This method constructs the network for severalthresholds and selects the threshold which leads to a network withscale-free topology. Moreover, the WGCNA method constructs a weightednetwork which means all possible relationships appear in the network,but each is weighted to show the significance of the co-expressionrelationship. Such gene co-expression analyses are conducted usingmethods well-established in the art, and will employ automatedcomputerized methods.

The present method may be used to identify genes linked to diseasecausality of disease conditions such as those involved in anydevelopmental process with respect to the brain, blood, heart, kidney,and others, and that might lead to neurological disease, cardiomyopathy,stroke, cancer and diabetes. As will be appreciated by one of skill inthe art, depending on the condition for which genes related to causalityare to be determined, the biological sample may vary. In addition, wherea gene co-expression analysis is conducted, enrichment for genesencoding proteins relevant to the selected disease condition may beemployed and may be different from that described herein fordevelopmental delay.

As one of skill in the art will appreciate, steps of the presentapplication are generally carried out using an appropriate processingdevice, e.g. a computer. For example, the steps of identifying exonswhich exhibit an expression level that is greater than the 75^(th)percentile of exon expression levels within a sample and, optionally,conducting a gene co-expression analysis, comparing the sequence of eachidentified exon to the sequence of a corresponding exon from a healthycontrol to identify sequence mutations, calculating the burden of rareor de novo mutations within the exon, determining the correlationbetween expression level of the identified exon and burden of de novo orrare mutations in the exon, as shown in FIGS. 30A and 30B, may each beconducted using a processing device or computer to analyze datasets ofexons, conduct comparisons against controls and then to do calculations.In order for the present methods to be practiced in a practical manner,the use of a processing device, e.g. a computer, is essential. Forexample, due to the volume of data to be analyzed and the methods bywhich the data is analyzed (e.g. complex computations), if the methodwere practiced without the use of a processing device it would requirean excessive amount of time and effort and would not provide a resultwithin a reasonable period of time. A benefit of the present inventionis that that method can be performed such that its results may have apractical effect with respect to diagnosis and subsequent treatment.

An appropriate processing device for use to conduct each step of thepresent method may include any suitable computer or microprocessor-basedsystem, such as a desktop or laptop computer or a mobile wirelesstelecommunication computing device, such as a smartphone or tabletcomputer, which may receive data or information to be processed. Thecomputer or microprocessor-based system may be coupled to another devicefrom which data/information to be processed in accordance with theclaimed method, or such data/information may be obtained from a separatestorage medium or network connection such as the Internet. Anillustrative computer system in respect of which the methods hereindescribed may be implemented is presented as a block diagram in FIG. 29.The illustrative computer system is denoted generally by referencenumeral 10 and includes a display 12, input devices in the form ofkeyboard 14 and pointing device 16, computer 18 and external devices 30.While pointing device is depicted as a mouse, it will be appreciatedthat other types of pointing device may also be used.

The computer may contain one or more processors or microprocessors, suchas a central processing unit (CPU) 22. The CPU performs arithmeticcalculations and control functions to execute software stored in aninternal memory 26, preferably random access memory (RAM) and/or readonly memory (ROM), and possibly additional memory 32. The additionalmemory may include, for example, mass memory storage, hard disk drives,optical disk drives (including CD and DVD drives), magnetic disk drives,magnetic tape drives (including LTO, DLT, DAT and DCC), flash drives,program cartridges and cartridge interfaces, removable memory chips suchas EPROM or PROM, emerging storage media, such as holographic storage,or similar storage media as known in the art. This additional memory maybe physically internal to the computer, external, or both. Informationutilized in the present methods may be stored on the internal memory 26or the additional memory 32, for example, tissue expression data andnucleic acid sequence data, such as the Exome Sequence Project (ESP)data, human brain microarray data, and human brain RNA Sequencing(RNA-seq) data. The computer system may also include other similar meansfor allowing computer programs or other instructions to be loaded. Suchmeans can include, for example, a communications interface 34 whichallows software and data to be transferred between the computer systemand external systems and networks. Examples of communications interfaceinclude a modem, a network interface such as an Ethernet card, awireless communication interface, or a serial or parallel communicationsport. Software and data transferred via communications interface are inthe form of signals which can be electronic, acoustic, electromagnetic,optical or other signals capable of being received by communicationsinterface. Multiple interfaces, of course, may be provided on a singlecomputer system.

Input and output to and from the computer is administered by theinput/output (I/O) interface 20. This I/O interface administers controlof the display, keyboard, external devices and other such components ofthe computer system. The computer will generally include a graphicalprocessing unit (GPU) 24 useful for computational purposes as an adjunctto, or instead of, the CPU 22, for mathematical calculations.

The various components of the computer system are coupled to one anothereither directly or by coupling to suitable buses.

The methods described herein may be provided as computer programproducts comprising a tangible computer readable storage medium, such asnon-volatile memory, having computer readable program code embodiedtherewith for executing the method. Thus, the non-volatile memory wouldcontain instructions which, when executed by a processor, cause thecomputing device to execute the relevant method.

The above systems and methods may be implemented entirely in hardware,entirely in software, or by way of a combination of hardware andsoftware. In a preferred embodiment, implementation is by way ofsoftware or a combination of hardware and software, which includes butis not limited to firmware, resident software, microcode, and the like.Furthermore, the above systems and methods may be implemented in theform of a computer program product accessible from a computer usable ornon-transitory computer readable medium providing program code for useby or in connection with a computer or any instruction execution system.In such embodiments, the computer program product may reside on acomputer usable or non-transitory computer readable medium in a computersuch as the memory of the onboard computer system of the smartphone, orthe memory of the computer, or on a computer usable or computer readablemedium external to the onboard computer system of the smartphone or thecomputer, or on any combination thereof.

An illustrative method 100 of the present invention performed by aprocessing device is presented as a block diagram in FIG. 30A. Themethod begins (block 101) with the optional step of conducting a geneco-expression analysis to identify relevant genes (e.g. weighted geneco-expression analysis—WGCNA), and then identifying exons which exhibitan expression level greater than the 75% percentile of exon expressionlevels in relevant genes of nucleic acid from one or more individualshaving a disease such as a neurosychiatric disorder (block 102). Then,the sequence of an identified exon is compared to the sequence of acorresponding exon from a healthy control (block 104). For rare or denovo sequence mutations detected (block 106), the method calculates theburden of rare or de novo mutations within the exon (block 108), andthen determines the correlation between expression level of theidentified exon and burden of de novo or rare mutations in the exon toidentify critical exons (block 110). The process then moves on to thenext identified exon (blocks 112, 114) and repeats the steps from blocks104 and 106. At block 106, if no rare or de novo sequence mutations aredetected, then the process moves on the next identified exon (blocks116, 118) and repeats the steps from blocks 104 and 106. Once the methodhas iterated through all of the identified exons (block 114; count=totalor block 118; count=total), then the process ends (block 120).

The information obtained from method 100 of the present invention maythen be utilized to diagnose disease in steps further performed by aprocessing device presented as a block diagram in FIG. 30B. Method 100is shown and begins (block 101) with an optional preliminary step of aweighted gene in co-expression network analysis (WGCNA) using proteinexpression to identify relevant genes, e.g. WGCNA (block 102). Thencritical-exons within relevant genes for a tissue are identified inwhich exon mRNA expression level is greater than the 75% percentile(compared to the genome), and rare non-synonymous mutation burden in thepopulation is <75% percentile (blocks 103-110 of FIG. 30A). For example,critical exons are identified which exhibit an expression level greaterthan the 75% percentile of exon expression levels, and which expressrare or de novo nonsynonymous variants in which the rare or de novomutation burden inversely correlates with the expression level. Themethod 100 is shown in FIG. 30B; however, once the information regardinggenes/mutations associated with disease causality have been determined,this method need not be repeated again in conjunction with diagnosticmethod 200 of FIG. 30B. Thus, method 200 may be conducted following andseparately from method 100.

From a disease cohort, a method 200 performed by a processing device isapplied to diagnose disease in a patient, based on the critical exonsidentified using method 100. Method 200 includes identifying the rare orde novo nonsynonymous variants in nucleic acid samples and identifyingthe exon comprising the variant (block 120). Next, the variant-impactedexon (e.g. mutant exon) is compared with indexed critical-exons (block122), e.g. identified by the method 100. If the mutant exon isdetermined to be a critical exon, the method may also identify theexpression and the tissue type of the mutant exon (determined to becritical) (block 124) to verify relevance of mutant exon. The method mayalso be used to identify the corresponding protein network for themutant exon gene (block 126). Thus, disease causality and/or diseasediagnosis, may be conducted based on the presence of one or morecritical exons using the present methods. The process may then move onto the next identified mutation and/or the next patient (blocks 128,130) and repeats the steps from blocks 120 and 122. At block 122, if acritical exon is not detected, then the process moves on the next rareor de novo sequence mutation and/or patient for comparison of mutationswith exons indexed as critical (blocks 132, 134) and repeats the stepsfrom blocks 120 and 122. Once the method has iterated through all of themutations (block 130; count=total or block 134; count=total), then theprocess ends (block 140).

Thus, the present method advantageously provides a means of identifyingdisease causality in complex developmental disorders, such asneuropsychiatric disorders, that will aid in the diagnosis and treatmentof such diseases. In particular, genes relevant to a neuropsychiatricdisorder may be identified, including the related sequence mutation(s)within a pertinent exon of the particular disease-related gene. Theidentification of disease causality in this manner lends to thediagnosis of neuropsychiatric disorders, and provides a basis fordeveloping treatments thereof by providing valid targets on whichpotential treatments may be based.

Embodiments of the present invention are described in the followingspecific examples which are not to be construed as limiting.

EXAMPLE 1 Materials and Methods Burden of Rare Missense Mutations

Data from the Exome Sequence Project (ESP)(http://evs.gs.washington.edu/EVS/, downloaded February 2013) initiatedby US National Health Heart, Lung and Blood Institute (NHLBI) was usedto calculate the burden of rare missense mutations in human populations.The data was processed by computer to identify variants according tosteps 104 and 106 of FIG. 30. Variants were obtained from 6,503 exomes(2,203 African Americans (AA) and 4,300 European Americans (EA)); eachexome had a median sequence coverage of at least 100× (Fu et al. Nature493, 216-220 (2013); published online EpubJan 10 (10.1038/nature11690).The composite RefSeq gene annotation (which includes all exons fromannotated isoforms) was used for the analysis and after excluding geneswith no variant calls (3,267 genes), a total of 20,417 genes (includingprotein coding and pseudo genes) remained for analysis. The number ofvariants that passed quality control was 1,284,453 in EA exomes and1,162,001 in AA exomes. Of these, the rare variants (<0.05 frequency)were 968,840 (75%) for EA and 600,174 (51%) for AA. The differences inrare variants between these two populations are highlighted in Fu et al.2013. Using a computer, the burden of rare missense mutation at the exonlevel for European populations using the genome analysis toolkit (GATK)variant annotation was determined according to step 108 of FIG. 30. Raremissense variants with a frequency <0.05 are a strong proxy for recent(mostly within the last 5,000-10,000 years) accelerated deleteriousmutation events in humans. For any exons within the composite gene model(hg19) (FIG. 1), variants annotated as missense for any gene isoformwere considered. A normalized burden of mutation was calculated from therare missense count for each exon e in the EA data:

$e_{n} = \frac{{rare}\mspace{14mu}{missense}\mspace{14mu}{count}}{e_{l}}$

where e_(n) is the normalized exon burden and |e_(l)| is the length ofexon e.

Expression Data A. Multiple Tissue Expression Data

Expression data was obtained from the Affymetrix GeneChip Human Exon 1.0ST array for 11 normal human tissues, each in triplicate (Gardina et al.BMC genomics 7, 325 (2006)10.1186/1471-2164-7-325). Tissues includedcerebellum, breast, heart, liver, muscle, kidney, thyroid, pancreas,prostate, spleen, and testis. Partek® Genomics Suite™ was used tocompute core meta-probe set signal intensity from the raw .CEL. Exonswere kept for analysis only if they were also covered in the exomesequencing project for variant calls. Probes prone to multiplehybridizations were excluded. The Robust Multi-array Average (RMA)algorithm (Irizarry et al. Nucleic acids research 31, e15 (2003)) wasused for background signal subtraction; normalization and the log2expression value was computed for each exon. A total of 16,713 RefSeqgenes (covering 141,224 exons) were surveyed in all 11 tissues. Using athreshold of log2-transformed intensity ≥6 to define expressivity(Sanders et al. Nature 485, 237-241 (2012)), 16,411 genes were detectedwith at least one exon expressed for a tissue sample.

Hierarchical clustering with Euclidean distance is a method to detectoverall variation among replicates. This method typically finds groupsof genes with a similar expression pattern. Then, a dendogram can beconstructed to illustrate the relationship between replicates. Thisapproach demonstrated that the replicates were clustered stronglyaccording to tissue type (FIG. 5A); testis and cerebellum showed similarexpression patterns, different from the other tissues. A second methodwas applied—principal component analysis (PCA)—to capture overallvariance among the tissues. PCA analysis also showed that tissues highlyclustered by tissue type (FIG. 5B), with testis and cerebellum showinggreater variance compared to other tissues.

B. Human Brain Microarray Data

The spatio-temporal expression profiles from developmental human brainused in this study were from the BrainSpan database(http://www.brainspan.org/static/download.html) (Sanders et al. 2012).The samples were chosen so that each developmental period comprised atleast two age- and sex-matched donors (Table 1). The developmentalperiods were categorized into three groups: prenatal (8 post-conceptionweeks (pcw) to 21pcw), early childhood (4 months to 3 years) andadulthood (≥13 years). For each donor, expression data was obtained fromat least 13 brain regions (Table 1 and 2). These included neocortex(NCX) (11 NCX regions includes orbital frontal cortex (OFC),dorsolateral prefrontal cortex (DFC), ventrolateral frontal cortex(VFC), medial prefrontal cortex (MFC), primary motor cortex (M1C),primary somatosensory cortex (S1C), posterior inferior parietal cortex(IPC), primary auditory cortx (A1C), posterior superior temporal cortex(STC), inferolateral temporal cortex (ITC), primary visual cortex(V1C)), amygdala (AMY), cerebellar cortex (CBC), hippocampus (HIP),mediodorsal nucleus of the thalamus (MD), and striatum (STR).

TABLE 1 Spatio-temporal Transcriptome (microarray and RNA-SEQ) data.Human development periods and sample brain tissues from 13 postmortemdonors. Age/ Analyzed Brain Region Nomenclature Period Category Male(M)Female(F) M F 8 pcw/ Embryonic 13058 — AMY, CGE, DFC, DTH, HIP, —Emblyonic ITC, LGE, MFC, MGE, M1C- S1C, Ocx, OFC, PCx, STC, URL,VFC 13pcw/ Prenatal 12820 12834 A1C, AMY, DFC, HIP, IPC, A1C, CB, DFC, HIP,IPC, ITC, Early Fetal ITC, M1C, MFC, OFC, S1C, M1C, MD, MFC, OFC, S1C,STC, STR, VFC STC, STR, V1C, VFC 21 pcw/Late Prenatal 12886 12365 A1C,AMY, CBC, DFC, HIP, A1C, AMY, CBC, DFC, HIP, Fetal IPC, ITC, M1C, MD,MFC, OFC, IPC, ITC, M1C, MD, MFC, S1C, STC, STR, V1C, VFC OFC, S1C, STC,STR, V1C, VFC 4-6 Mo/ Early 12296 12297 A1C, AMY, CBC, DFC, HIP, A1C,AMY, CBC, DFC, HIP, Early Childhood IPC, ITC, M1C, MD, MFC, IPC, ITC,M1C, MD, MFC, Infancy OFC, S1C, STC, STR, V1C, VFC OFC, S1C, STC, STR,V1C, VFC 3 Y/ Early 12980 12836 A1C, AMY, CBC, DFC, HIP, A1C, AMY, CBC,HIP, IPC, Early Childhood IPC, ITC, M1C, MFC, OFC, ITC, M1C, MD, S1C,STC, STR, Childhood S1C, STC, V1C, VFC V1C, VFC 13-15 Y/ Adult 1229912831 A1C, AMY, CBC, DFC, IPC, A1C, AMY, CBC, DFC, HIP, Adolescence ITC,M1C, MD, MFC, OFC, IPC, ITC, M1C, MD, MFC, S1C, STC, STR, V1C, VFC OFC,S1C, STC, STR, V1C, VFC 37-40 Y/ Adult 12303 12304 A1C, AMY, DFC, HIP,IPC, A1C, AMY, DFC, HIP, IPC, Adulthood ITC, M1C, MD, MFC, OFC, S1C,ITC, M1C, MD, MFC, OFC, S1C, STC, STR, V1C, VFC, STC, STR, V1C, VFC pcw= post-conceptional weeks; Mo = months; Y = years.

TABLE 2 Brain region ontology and nomenclature used in this study. BrainRegion Analysed Region Name AMY amygdaloid complex CGE caudal ganglioniceminence DFC dorsolateral prefrontal cortex DTH dorsal thalamus HIPhippocampus ITC inferolateral temporal cortex LGE lateral ganglioniceminence MFC medial prefrontal cortex MGE medial ganglionic eminenceM1C-S1C primary motor-sensory cortex Ocx occipital neocortex OFC orbitalfrontal cortex PCx parietal neocortex STC posterior (caudal) superiortemporal cortex URL upper (rostral) rhombic lip VFC ventrolateralprefrontal cortex A1C primary auditory cortex IPC posterior inferiorparietal cortex M1C primary motor cortex S1C primary somatosensorycortex STR striatum CB cerebellum MD mediodorsal nucleus of thalamus V1Cprimary visual cortex CBC cerebellar cortex

RNA from 196 tissue samples from 13 post mortem donors were run on theAffymetrix GeneChip Human Exon 1.0 ST array platform. This data setsurveyed 17,296 genes, comprising 230,881 exons, from which at least oneexon was expressed (log₂-transformed signal intensity of ≥6) in at leastone brain region. The quality control (QC) measures and the microarraydata processing can be found at Kang et al. Nature 478, 483-489 (2011).Further quantile normalization was applied across all samples. Exonswithout probes in the microarray, or if not covered in the NHLBI exomesequencing project, were also excluded. The ‘loadings’ of principlecomponent analysis (PCA) are correlation coefficients between theprinciple component scores and the original variable, and reflect theimportance of each variable in accounting for the variability in theprinciple component. A PCA box plot was constructed (FIG. 2) for thegene sets and loading vectors extracted from genome-wide spatio-temporalPCA analysis on 196 brain samples (with gene as object). It showed thegreatest variation (captured as the first principal component (PC1)loadings) between prenatal (embryonic, early and late prenatal) andpostnatal (early childhood and adult) transcriptomes. The secondcomponent (PC2) loadings captured variation between embryonic and earlychildhood expression of genes. The analysis from both PC1 and 2 showsCBC with highest variance among all brain regions analyzed.

C. Human Brain RNA Sequencing (RNA-seq) Data

RNA sequencing (RNA-seq) data was derived from the BrainSpan database.These data contained transcriptome profiles from the same 196 tissuesfrom 13 postmortem donors as used in the microarray analysis. Methoddetails for sequencing, alignment, quality control and expressionquantification can be found in the BrainSpan Technical White Paper 2011(http://www.brainspan.org/). The expression measures were provided forexons as reads per kilobase (kb) per million (RPKM) from mapped reads.Quantile normalization was applied to the computed RPKM to reducesystematic bias across the samples. Exons were excluded if not coveredin the exome sequencing project. RNA-seq data were used primarily tovalidate association significance (between exon expression level andburden of rare missense mutations) that was observed by microarray.

3. Candidacy Tier of Genes for Neuropsychiatric Phenotypes A. De NovoSingle Nucleotide Variants (SNV) and Indels

To validate the analysis, a set of de novo mutations that are highlylikely to give rise to the neurological conditions under study (i.e.,ASD or other) were used. Hence, 243 genes with 256 de novo mutations(nonsense, frameshift, splicing, missense) that are validated andpredicted (using multiple algorithms) to be highly damaging, from ninewhole genome/exome sequencing (WGS/WES) studies were curated. Theseinclude 214 genes from ASD studies; 14 from a schizophrenia study; and15 from intellectual disability studies. 82 genes with 84 de novopossibly damaging exonic mutations from two large studies were compiledthat used the Simon Simplex Collection (SSC). SSC is a phenotypicallycomprehensive resource of simplex pedigrees that consists of ASDprobands and their unaffected siblings.

B. De Novo Copy Number Variation (CNV)

de novo CNVs were collected from three published data sets (Pinto et al.Nature 466, 368-372 (2010); Levy et al. Neuron 70, 886-897 (2011);Sanders et al. Neuron 70, 863-885 (2011)) and one unpublishedcase-control set (Autism Genome Project (AGP) stage 2). AGP Stage 1reported its genome wide finding with validated rare/de novo CNVs usingthe Illumina 1M-single array (Pinto et al. 2010). The ongoing stage 2uses Illumina 1M-Duo. The total number of ASD cases used (stage 1+2combined) was 2,446 and 2,640 controls. In the present analysis, 101validated de novo CNVs were used from the cases, which encompass 886unique genes. From stage 1 and 2 AGP population controls, genes withinthe rare CNVs (<0.01 frequency) were used for the analysis. In addition,de novo/rare losses impacting coding sequences for a single geneexclusively in ASD Probands were documented. Similarly, rare singlegenic losses observed exclusively in controls were ascertained (Table3).

TABLE 3 AGP dataset (from stage 1 and 2) consists of losses impactingcoding sequences for a single gene (de novo/rare). Rare + de novo(Single AGP Stage 1 +2 Genic CNV Loss) genes. ASD Cases 153 Controls 148

Genome wide de novo CNVs from two large well-studied SSC cohorts werecurated; SSC1 refers to 64 de novo CNVs (affecting 497 genes and SSC2refers to 75 de novo CNVs (affecting 993 genes). A significantlyincreased de novo CNV burden is documented for ASD probands. For the SSCcohorts, there were not enough validated rare/de novo CNVs reported forunaffected siblings for meaningful statistical analysis.

C. OMIM Dominant and Recessive Phenotypic Genes for Nervous System

Online Mendelian Inheritance in Man (OMIM) is a continuously updatedcatalog of human genes and genetic disorders and traits. The database(http://omim.org/) is a compendium of human genes and phenotypicinformation. The MIM number notation was used to infer the mode ofinheritance for all the genes in the Human Phenotype Ontology (HPO)database (harite.de/hudson/job/hpo.annotations.monthly/lastStableBuild/,accessed in April 2013). Genes with HPO phenotype annotation derivedfrom OMIM were only considered, the mode of inheritance could reliablybe inferred for these. OMIM identifiers corresponding to official genesymbols were obtained from the Bioconductor package (version 2.8),available for R statistical software (version 2.15.3). The analysis wasprimarily based on human phenotypes: “abnormality of the nervous system”(HP:0000707) which includes 319 autosomal dominant genes, and 487recessive genes.

D. FMRP and FOXP2 Target Genes

The loss of Fragile X mental retardation protein (FMRP) causes Fragile XSyndrome (FXS) which is the most common inherited form of intellectualdisability. FXS is the leading monogenic cause of ASDs; approximately50% of all FXS cases show autistic behavior. 842 gene products have beenidentified as FMRP targets. Multiple lines of evidence show asignificant enrichment of mutations in FMRP target genes among ASDprobands. The functional relevance of FMRP targets is also implicatedwithin the context of ASD pathogenesis. The surplus of pathogenicmutations among FMRP targets for cases relative to controls implies thatthe targets are undergoing strong purifying selection.

In contrast to FMRP, a subset (Table 4) of genes targeted by Forkheadbox P2 (FOXP2) protein show strong evidence of positive selection. FOXP2is a highly conserved gene implicated in human speech and languagedisorders. Its target genes have been curated from three studies andAyub et al. 2013 showed a subset of these (131 genes) to have strongpositive selection properties only in the European population (as usedin this analysis). These genes play important roles in the plasticity ofthe human developing brain.

TABLE 4 FMRP target genes; FOXP2 gene targets show evidence of positiveselection; and the reported ASD genes (autosomal dominant and X-linked).Category (Positive/Purifying Selection) Genes FMR1 Targets UnderPositive Selection 842 FOXP2 Targets Under Positive Selection 131 ASD(AD = autosomal dominant and XL = X-linked) 81

E. Genes for Post-Synaptic Proteome (PSP)

Bayés et al. (Nature neuroscience 14, 19-21 (2011)), isolatedpost-synaptic density from human neocortex and identified a large numberof proteins, termed the post-synaptic proteome (PSP). They identifiedmutations within the PSP genes as the cause of 133 neurological andpsychiatric diseases. Evidence from animal models (primate and rodent)also shows the importance of the PSP for nervous system disease andbehavior. All experimentally identified human post-synaptic densitygenes were used in the analysis (Table 4).

F. Single Genes Associated with ASD

A list was curated from studies of ASD families of mutations withMendelian inheritance (autosomal dominant (AD) or x-linked (XL)). Thislist consists of the best-studied bona fide ASD genes from multipleindependent studies.

4. Analysis A. Pearson Correlation

The Pearson correlation significance test for linear relationshipsbetween exonic expression and rare missense mutation burden was used.P-value<0.05 (after Bonferroni multiple test correction) was thethreshold for significance. Unless otherwise noted, the statisticalanalyses were performed using R statistical software, version 2.15(http://www.r-project.org/). The Wilcoxon association test (Bonferronicorrected) was applied to quantify spatio-temporal expressiondifferences for exons ascertained by virtue of mutation in probands andunaffected siblings.

B. Association

The association between exon expression level and burden of raremissense mutation using Fisher's exact test (FET) was tested. Theassociation was determined using a computer. To test the association,expression level was classified as high or low (above or below the75^(th) percentile, respectively). The percentile was computed fromexpression data for all samples, separately either from microarray (11tissues or spatio-temporal data from BrainSpan) or RNA sequencing,according to steps 102, 103 of FIG. 30A. A similar procedure was used toclassify the burden of rare missense mutations in EA data (FIG. 10),according to steps 104-108 of FIG. 30. The association was tested foreach tissue sample independently and corrected for multiple testswhereby the observed p-value was multiplied by the number of samples.

C. Splicing Code

Previous analysis showed within neocortex regions 57.7% of genes (withat least one exon) showed the presence of differentially expressed exonsin prenatal samples, whereas 9.1% and 0.7% of genes had differentiallyexpressed exons in early childhood and adulthood samples, respectively.One mechanism to produce differential expression is through alternativesplicing of exons. To analyze the implications of de novo SNVs(ascertained in proband and siblings) on alternative splicing patternsof affected genes, the ‘splicing code’ (Barash et al. Nature 465, 53-59(2010)) was run on the variants. The de novo mutations were first mappedonto canonical transcripts, and determined the target exon and itsflanking introns and immediate neighboring exons. Then, all splicingcode cis features were extracted for the mutation and the wildtypecounterpart, passed through the code, and the predicted change ininclusion of the target exon was computed. Exon inclusion, denoted by‘percent spliced-in’ (PSI Ψ), is defined as the fraction of times anexon is included in the final transcript. For each de novo mutation, thesplicing code predicts a ΔΨ; if a mutation's ΔΨ is larger than 95% of ΔΨfor common polymorphisms (single nucleotide polymorphism database (dbSNP)), then it was deemed significant. The ‘splicing code’ analysisfound only 73 de novo mutations (i.e. from probands and siblings) topass the ΔΨ threshold of −0.01 (Table 10). The code-predicted ΔΨs showedno significant difference between de novo mutations of probands andsiblings (Kolmogorov-Smirnov (K-S) test and AUC; FIG. 8).

Next, applying two additional normalizations, it was analyzed whetherexpression of exons that can vary (according to developing time periodsor across tissues) are exons that exclusively contributed to theassociation between spatio-temporal expression and burden of raremissense. For each gene (g), differential expression was computed byapplying two transformations on each exon, e_(d,i): 1) subtracting theexon expression e_(i) from the gene mean expression

$\left. {e_{d,i} = {e_{i} - {\frac{1}{n}{\sum\limits_{i}^{n}{{g\left( e_{i} \right)}\mspace{14mu}{and}\mspace{14mu} 2}}}}} \right)$

subtracting the total expression from the gene median expressione_(d,i)=e_(i)−median(g(e)). After applying both normalizationsindependently to the Brain-Span data, the association test for geneswith de novo SNVs (as ascertained from proband or unaffected siblings)using spatio-temporal brain samples was not significant. This indicatedthat these are not the only exons contributing to the association signalthat is consistent with the ‘splicing code’ analysis.

D. Detection of ‘Brain-Critical Exons’

An algorithm was derived to detect ‘brain-critical exons’, utilizing themultiple tissue microarray expression indices of an exon. Exonsclassified separately for each dataset into class d (for each dataset75^(th) percentile was used to construct contingency table in FIG. 1)were termed as ‘critical exons’ reflecting ‘high’ expression level andlow' burden of rare missense mutation, according to steps 102 to 110 asshown in FIG. 30. To identify those exons specifically ‘critical’ inbrain (or ‘brain-critical exons’), the array matched (AffymetrixGeneChip Human Exon 1.0 ST) multiple tissue data set (ten non-braintissue replicates) was used. First, all class d ‘critical exons’ frombrain tissues (Brain-Span samples and cerebellum microarray replicates)were identified. From this exon set C, exons were excluded that were notin class d for at least 10% of the brain samples for a given donor(BrainSpan data), or in a cerebellum replicate. Next, exons from set Cwere excluded that were in class d (for at least one tissue replicate)for ten other non-brain tissues. Therefore, after exclusions, set C′comprised exons with consistently high expression (above 75^(th)percentile) and low burden of deleterious mutations, in brain tissue butnot in other tissues. Similarly, tissue-specific ‘critical exons’ wereidentified from the other tissue types. 15,872 tissue-specific criticalexons, of which 3,955 were specific to brain were detected (FIG. 18).

A de novo mutation within one of these ‘brain-critical exons’ (or inother exons that can indirectly affect (truncate) such exons) could havean adverse effect on brain function while being relativelyinconsequential in other tissues. Brain showed at least 2 fold more‘critical exons’ compared to other tissues (except testis). A highnumber of ‘testis-critical exons’ were also observed. The functionalenrichment analysis revealed that the genes corresponding to these‘brain-critical exons’ are primarily associated with neuronaldevelopment, signaling networks and functions that regulate synapsedevelopment and plasticity. Next, each gene (g),was assigned a score

${S(g)} = \frac{\#{brain}\mspace{14mu}{critical}\mspace{14mu}{exon}\mspace{14mu}{in}\mspace{14mu}(g)}{{mean}\mspace{14mu}{brain}\mspace{14mu}{critical}\mspace{14mu}{exon}}$

where mean brain-critical exon was computed from all genes in the genomeanalyzed (FIG. 18).

E. ‘Brain-Critical Exons’ Enrichment Test

Enrichment of brain-critical exons on candidate gene sets that arehighly related to key biological processes in brain and ASD or otherco-morbid disease pathogenesis were assayed. FET was first used toidentify the sets over-represented in two backgrounds: 1) all exons werecompared (first background) to identify ‘brain-critical exons’ overlapsand 2) restricted the test only to highly (all exons expression above75^(th) percentile for a brain sample in developmental human brains)brain-expressed exons (second background). Although the secondbackground tested a narrow hypothesis, this approach assumes a uniformgene sampling probability, which may not reflect accurately differencesin gene exon density and composition. Therefore, an empirical testcomparing the percentage of genes with at least one critical exon thatare also gene-set members versus the percentage obtained by random exonsampling (first background) was also used; this test was also performedsampling only from brain-expressed exons (second background).

Permutation analysis on FMRP targets, PSP, ASD (AD/XL) and de novo ASDgenes was conducted by randomly drawn equal numbers of exons (i.e. 3,955exons randomly drawn 10,000 times from two backgrounds: RefSeqbackground and second background with exons highly expressed andanalysed the gene set enrichment under these premises. The empiricalp-values were computed from 10,000 random draws and false discovery rate(FDR) was computed using the Benjamini-Hochberg method (46).

F. Reverse Transcription Polymerase Chain Reaction (RT-PCR) andQuantitative RT-PCR (qRT-PCR)

The mRNA expression and alternative splicing events in selected geneswere analyzed by RT-PCR and qRT-PCR. For detecting splicing events,primer pairs were designed to prime from the flanking exons (exons 3 to12 of SLC6A1 and exons 16 to 18 for EPB41L3 gene) and different primerswere combined to capture most possible exclusion events (for SLC6A1)(Table 5). Exclusion events were confirmed by Sanger sequencing. RNAsamples (BD Biosciences) were used from whole adult (A) or fetal (F)human brains as templates for cDNA synthesis using Superscript III Firststrand Synthesis Supermix (Invitrogen). RT-PCR was performed usingstandard PCR conditions.

TABLE 5 Primer sequences for alternative splicingand for relative expression analysis by RT-PCR. Target Inclu- Exclu- De-region F R sion^(a) sion^(b) tected^(c) SLC6A1 TGACA ACAGG 503 bp —503 bp, exon3- TCGCA TAGTA 371 bp, 5 CGTCC AATGG 313 bp, ATCTG CCCAG181 bp (SEQ ID G NO: 1) (SEQ ID NO: 2) SLC6A1 GAGCC ACTGT 252 bp —252 bp exon4- TTCC TTCCA 6 TGAT CGGCA CCCCT GTGT AT (SEQ ID (SEQ IDNO: 4) NO: 3) SLC6A1 TGACA ACTGT 539 bp — 539 bp, exon3- TCGC TTCCA407 bp, 6 ACGTC CGGCA 349 bp, CATC GTGT 217 bp TG (SEQ ID (SEQ ID NO: 4)NO: 1) SLC6A1 CGGCT CAGCC 322 bp — 322 bp exon5- GCTG AACAC 7 TGCTACCTTC TCAT CAGAT TC (SEQ ID (SEQ ID NO: 6) NO: 5) SLC6A1 GAGCC CAGCC466 bp — 466 bp exon4- TTCC AACAC 7 TGATC CCTTC CCCT CAGAT AT (SEQ ID(SEQ ID NO: 6) NO: 3) SLC6A1 GGCTG GGAAA 330 bp — 330 bp exon7- GATAGAGTT 9 AGCCA GTAGCT GGTC CCCGA AG (SEQ ID (SEQ ID NO: 8) NO: 7) SLC6A1CCTGT CATGA 284 bp — 284 bp exon8- TCTT AGCCC 10 CCGTG ACGAT GAGT GGAGAGA\ (SEQ ID (SEQ ID NO: 10) NO: 9) SLC6A1 GGCTG TACTC 628 bp — 628 bpexon7- GATA ATCCA 10 AGCCA CCAGG GGTC GCTGT AG (SEQ ID (SEQ ID NO: 12)NO: 11) EPB41L3 CCTCA GTCTC 395 bp, 284 bp, 395 bp, Exon16- ACAG CCGCC272 bp 272 bp 18 ACACT GAGTA GCCG AGAAG TA (SEQ ID (SEQ ID NO: 14)NO: 13) ^(a)expected length in the longest isoform variant ^(b)expectedsize of any known exclusion events (Refseq) in the region. ^(c)FIG. 3Cor FIG. 12 RT-PCR detection bands length.

TABLE 6 Primer sequences for relative expressionquantification using quantitative real-time PCR (qRT-PCR) of brain specific critical exons located within16p11.2 CNVs.  Gene qRT-PCR primers (F/R) KCTD13_F CCTCATCCCCATGGTGACATC(SEQ ID NO: 15) KCTD13_R TTACTGCGGTTGTGCAGGAG (SEQ ID NO: 16) ASPHDl_FGACTTTGTCTTCGCCCCAGA (SEQ ID NO: 17) ASPHD1_R CTACCATCAAGCCGTCCCTG(SEQ ID NO: 18) SEZ6L2_F TCCGACATTCTCACTTGCCA (SEQ ID NO: 19) SEZ6L2_RCCAGGGTCAGCACAAGTCAT (SEQ ID NO: 20) PRRT2_F GGACTGTTCTCTGCTGGGAC(SEQ ID NO: 21) PRRT2_R GTTCGGAACCAGTACTCGCC (SEQ ID NO: 22) MED13_FCCGCATCCTGATGTGTCTGA (SEQ ID NO: 23) MED13_R TTGCAGGTGGATACGTGACT(SEQ ID NO: 24)

For the quantification of ‘brain-critical exons’ by qRT-PCR, primerswere designed to prime either from within the specific exon, or withinthe specific exon and the flanking exon (Table 6). All primers werefirst tested using RT-PCR with whole brain cDNA, -RT whole brain cDNAand non-target samples as templates. For validation of brain specificcritical exon (FIG. 18) expression from selected genes, RNA from 11human tissues (mammary gland, cerebellum, heart, kidney, liver, skeletalmuscle, pancreas, prostate, spleen, thyroid and testis (BD Biosciences)was used. DNase I-treated RNA was reverse-transcribed using SuperscriptIII First strand Synthesis Supermix (Invitrogen). RT-PCR was performedunder standard PCR conditions using 10 ng of cDNA as template. TheqRT-PCR assay was performed using Brilliant III SYBR® Green PCR MasterMix (Agilent) in a total reaction volume of 15 μl, containing 10 ng ofcDNA templates. The reactions were amplified using Mx3005P (Agilent).The expression of each gene of interest was normalized using ACTB orMED13 (dCt) and presented as relative expression (2{circumflex over( )}(−dC0). The primers were tested for their PCR efficiency by dilutionstandard curve and for specificity with melting curve analysis.

5. Pathway Enrichment Analysis A. Construction of Gene Sets

Gene sets were derived from the manually curated gene ontology (GO) (Rpackage, version 2.8.0), pathways from National Cancer Institute at theNational Institutes of Health (NCI-NIH) (May 30, 2013), KyotoEncyclopedia of Genes and Genomes (KEGG) (May 30, 2013) and Reactome(June 3rd, 2013); websites as previously described. For the analysis,term annotated with more than 1000 genes and fewer than 50 genes wereexcluded from GO, NCI, KEGG and Reactome, resulting in a total of 4,138gene sets.

B. Gene Set Overrepresentation Analysis

To identify biologically meaningful pathways, an overrepresentationanalysis was performed using the gene sets described above. FETdetermined the significance of the enrichment for a given pathway. Thetest evaluates the enrichment of genes in a pathway against a commonbackground consisting of all the genes in the data set. FET was used toassess gene-set enrichment. After obtaining the null distribution ofP-values, FDR was computed using the Benj amini-Hochberg procedure. Agene-set was considered to be significantly enriched when FET p<1.0×10⁻³and FDR<0.01. This cut-off was used to construct and visually representa functional enrichment map (FIG. 15C).

C. Permutation Test

A permutation test was undertaken by randomly drawing equal number ofexons (3,955) and re-analysis of the data under the null-hypothesis toreveal the strength of enrichment association with the gene list. Ifthis procedure is performed a sufficient number of times, the resultingsets of p-values are presumed to be a reasonable approximation of thenull distribution of thep-values, given a test statistic t(observed) andnumber of permutations n and the test statistics in permuted datat(i,permutation); i=1, . . . n; (n=10,000 permutation).

$P = \frac{\#\left( {{{abs}\left( {t\left( {i,{permutation}} \right)} \right)} \geq {{abs}\left( {t({observed})} \right)}} \right)}{n}$

Permutation was conducted for molecular pathway analysis randomlydrawing an equal number of exons (equal to ‘brain-critical exons’ 3,955)from RefSeq genes and enrichment analysis was done on GO, NCI, KEGG andReactome gene sets. The empirical P-value was not significant after thepermutation test.

D. Visualization of Functional Enrichment Map

Gene sets that were significantly enriched with ‘brain-critical exons’(described above) were represented by constructing a network where eachgene-set is a node, and the edges represent gene overlap between sets.The Cytoscape network software v.2.8.3 and the plugin “Enrichment Map”were used to construct and visualize the network. The node size isproportional to the total number of genes in the respective gene set.After constructing the enrichment map, the nodes were colored accordingto the p-value obtained from FET. The functional gene-set clusters weremanually identified and annotated, and highlighted by shadedovals/circles, representing groups of gene sets that are highlyoverlapping. Ovals/circles highlighted blue represents functionalpathways implicated in previous ASD or other neuropsychiatric disordersliterature; GTPase signaling, ERBB-NGF signaling pathways, neuronalsynapse , higher cognitive functions, voltage-gated channel, neuronprojection and axonogenesis, synaptic vesicle, brain development, neurondifferentiation and mitogen-activated protein kinase (MAPK) cascade.

Results

It was sought to exploit transcriptome maps of human developing braincoupled with data from population genetics studies about mutationalburden to examine the impact of genetic variants in ASD and otherneuropsychiatric disorders (FIG. 1 and FIG. 2). Exome sequencing hasrevealed an excess of damaging variants arising over the past 5,000years among individuals of European ancestry, but how this relates togene expression in developing human brain or to disease is littleunderstood. The initial examination of a few bona fide ASD risk genes(SHANK1, SHANK2, FMR1, AFF2, SCN2A) in control sample data revealed aninverse correlation between the burden of rare missense mutations(defined as the number of rare missense variants divided by the exonlength) and the expression levels for individual exons in prenatal brain(FIG. 3A and FIG. 4).

It was then hypothesized that due to adverse functional consequences,the accumulation of potentially deleterious mutations is suppressed for‘critical exons’ demonstrating higher expression in brain. It wasundertaken to test whether dysregulation of the expression of suchcritical exons caused by rare mutations was more likely to cause diseasethan other exons where recent mutational burden was relaxed regardlessof gene expression. Exons were classifed in two ways to construct anexon transcriptome/mutation contingency index: (i) exon expressionlevels were discretized into high and low classes (above or below75^(th) percentile of the entire data, respectively) and (ii) raremissense mutation burden was defined as the number of rare missensevariants divided by the exon length and defined as high or low in thesame manner (FIG. 1). With expression data from 11 human tissues (eachin triplicate (FIG. 5)) including 16,713 RefSeq genes, genome-wideassociation between the burden of rare missense mutations and exonicexpression was determined; brain cerebellum showed a strikingassociation (Fisher's exact test (FET), Bonferroni corrected,p<2.53×10⁻⁶⁷) (FIG. 3B).

To examine the relevance of the observations in relation to disease, itwas tested whether genes found (by sequencing or microarrays) inselected datasets to carry de novo mutations (SNV/indel or CNV,respectively) predicted to be deleterious, exhibit a similar inversecorrelation with spatio-temporal exonic expression. From whole exome orgenome sequence datasets of index cases of ASD, SZ or ID, 243 genescontaining 256 de novo mutations predicted to be deleterious werecollected. For the ASD cases from the Simon's Simplex Collection (SSC),equivalent sequence data exists for unaffected sibling controls,identifying 82 genes with de novo exonic mutations. Exon expression datawas obtained from both microarrays and RNA-sequencing of 196 normalhuman brain tissue samples.

The initial analysis between the ASD case and sibling genes showedidentical conservation scores and no significant difference in thedistribution of the burden of rare missense mutations (FIG. 6-7). Inaddition, when it wastested if the de novo mutations in these geneinduce aberrant-splicing pattern using the ‘splicing code’, again nosignificant difference was found (FIG. 8). However, the inversecorrelation was found to be strikingly significant withinspatio-temporal brain microarray-based transcriptome datasets (i.e. fromcontrol samples) for genes with de novo mutations ascertained in ASDprobands (Spearman's Correlation Test, Bonferroni corrected;p<2.0×10⁻¹⁶). No association was detected for genes ascertained throughunaffected siblings (FIG. 3C). A similar pattern of inverse correlationfor ID and SZ de novo mutated genes was also observed (FIG. 3C).Applying a 75^(th) percentile contingency index, the strongestassociation of genes in ASD cases (FET, Bonferroni corrected;p<1.13×10⁻³⁸; OR=2.40) was observed for the expression levels inprenatal neocortex samples (FIG. 9A); no association was observed in anybrain expression data within ASD siblings genes. The association for theASD case genes was validated using RNA-sequencing transcriptome data(FIG. 10).

When restricting the analysis only to exons with de novo mutationsascertained from ASD cases and siblings, significant differences wereobserved in expression pattern with respect to high and low raremissense mutation burden (FIG. 9B), predominantly for prenatal samples(Wilcoxon test, Bonferroni corrected; p<6.06×10⁻⁴⁰ ). In one example,reverse transcription-polymerase chain reaction (RT-PCR) confirmed theinverse correlation of missense burden and expression level of exons 5and 9 in SLC6A1 (for which de novo mutations are reported in ASDprobands) including evidence that these are not subject to splicing innormal prenatal or adult brain (FIG. 9C). However, the exons with denovo mutations found in unaffected siblings have a differentnon-specific pattern (FIG. 9B and 11). It was confirmed that exons proneto differential usage do not represent a significant determinant drivingour association signal.

When restricting the analysis to exons with de novo mutationsascertained in ASD cases and in siblings without ASD, 37.3% and 32.2% ofthe 196 brain samples, respectively, showed high expression (above the75th percentile) (FIG. 9A). The presence of a roughly equal rate of denovo mutations in cases and siblings within these exons that are highlyexpressed in the brain makes it difficult to distinguish actualmutational effects. However, when the transcriptome-mutation contingencyindex was applied, a difference was found (predominantly for prenataltranscriptome samples), whereby 116 of 256 (45%) exons affected by denovo mutation in cases and 24 of 84 (28%) exons affected by de novomutation in unaffected siblings were classified as critical exons. Incontrast, a two-fold excess of de novo mutations classified as‘tolerated’ was found in siblings (FIG. 10D). Consideringloss-of-function mutations only, the enrichment was three-fold forcritical exons in cases (FIG. 10E).

Building from theASD, ID and SZ findings, a more general grouping ofgenes involved in brain disorders was tested (termed nervous systemabnormalities in the Online Mendelian Inheritance for Man) for a similarinverse correlation with exon expression. A positive association wasobserved (p<1.2×10⁻⁹; OR=1.73) in cerebellum (amongst 11 tissuesexamined) for autosomal dominant disease genes (FIG. 12). Furtheranalysis revealed an association in genes under purifying selection(FIG. 13).

For CNV analysis, de novo calls from the Autism Genome Project (AGP)Consortium (Stage 1 and 2 and two independent SSC cohorts (SSC1 andSSC2) were used. The combined AGP stage 1 and 2 data (2,446 ASD casesand 2,640 controls), included 101 de novo validated CNVs (losses andgains) (affecting 886 genes in cases and 544 genes in controls). The SSCcohort data included 64 de novo calls (affecting 497 genes) in SSC1 and75 de novo CNVs (affecting 993 genes) from SSC2. As with findings for denovo SNV/indel mutations, striking associations were observed betweenmicroarray-based expression and rare missense burden in controlspatio-temporal brain tissues for genes ascertained through de novo CNVsin ASD cases (FIGS. 14 and 15A). The strongest association was found forprenatal neocortex samples (FET, Bonferroni corrected, p<8.06×10⁻¹⁶;OR=1.47). The lack of significant association for CNVs ascertained incontrols implies that only de novo CNVs from cases harbor genes enrichedwith exons that follow the correlation principal between expression andrare missense burden. The association using RNA-sequencingtranscriptomes for all three CNV data sets was also validated (FIG. 16).

Since these de novo CNVs typically encompass more than one gene(average: AGP=9 genes; SSC1=8 genes; SSC2=13 genes) a core ofdose-sensitive genes were producing the signals. Thus, the analysis wasfurther restricted to CNVs involving single gene losses, and overallhigher odds ratios was observed for those ascertained in cases than incontrols (FIG. 17), reminiscent of what was observed using the de novoSNV and indel data.

To develop a resource for potential discovery of neuronal disease genes,the inverse correlation framework was applied to multiple transcriptomedatasets and 3,955 ‘critical exons’ (from 1,744 unique genes) with highexpression specific to brain and low rare mutation burden (2-foldgreater expression in brain over that of most other tissues—testis beingthe exception and similar to brain) were identified (FIG. 18 and Table7). Such exons are present in most transcripts of their respective genesand functional or splicing mutations in these regions may be manifest asvarious disorders of the brain. For example, RBFOX1 and its downstreamregulatory targets ATP2B1 and GRIN1 were identified in thisbrain-specific critical exon dataset and this pathway has recently beenfound to be dysregulated in ASD. Moreover, SHANK1, SHANK2, NRXN2, NRXN3,SCN2A, FMR1, AFF2 and AUTS2 genes, known to confer risk of ASD were alsofound to be enriched with brain-specific critical exons (transcriptomedata were not available for NRXN1 and SHANK1).

Biological pathways were analyzed using the FET to look for potentialenrichment (considering two backgrounds, first, entire RefSeq exons andsecond, only brain expressed exons) of the 3,955 critical exons incandidacy tier datasets comprised of Fragile X mental retardationprotein (FMRP) targets, the post-synaptic proteome (PSP), bona fide ASDrisk genes (autosomal dominant (AD) and X-linked (XL)), and genesaffected by predicted deleterious de novo mutations from ASD triostudies. Indeed, highly significant enrichment of critical exons wasdetected for the FMRP targets: 41.8% (Bonferroni corrected FET;p<2.91×10⁻¹⁵⁷, OR=9.52), PSP: 28.11% (p<7.190×10⁻⁵⁴; OR=4.43), ASD(AD/XL): 34.56% (p<3.40×10⁻¹¹; OR=6.08) and de novo mutation genes inASD probands: 25.5%(p<7.73×10⁻¹³; OR=3.31)(FIG. 15B). Empirical p-valuesfrom 10,000 permutations (from both background) also showed significanceafter correction for Benjamini-Hochberg false discovery rate (FDR). As anegative control, an equal number of exons (3,955) that are enriched forrare missense mutations (tolerated in control brain tissues) and highlyexpressed in brain were analyzed. Enrichment re-analysis showed nosignificance after conducting permutation tests on two backgrounds(Table 8).

TABLE 8 Enrichment Analysis for Exons Highly Enriched with Burden ofRare Missense Equal number of exons were analysed (3955 exons) whereburden of rare missense mutation is high. Following is the Fisher'sExact Test and empirical significance (using two background) ofenrichment after 10,000 permutation for FMRP, Post-synaptic Proteome,ASD (AD, XL) and de Empirical P Empirical Gene (10,000 P CategoriesBackground FET P Permutation) (Corrected) FMRP Refseq 6.70E−05 1 1 FMRPBrain 0.000677025 1 1 Expressed Post-Synaptic Refseq 0.03616252 0.97 1Proteome Post-Synaptic Brain 0.454409035 0.99 1 Proteome ExpressedASD(AD,XL) Refseq 0.801966304 0.73 1 ASD(AD,XL) Brain 0.887460535 1 1Expressed SNV Refseq 2.33E−06 0.5 1 SNV Brain 0.000677025 0.99 1Expressed

1,744 genes encompassing these 3,955 critical exons for enrichment inspecific functional or biological pathways (stringent cutoffs ofp<1.0×10⁻³; FDR of <0.01) were examined. Gene sets were downloaded frommultiple databases: gene ontology (GO), pathways from the NationalCancer Institute at the National Institutes of Health (NCI-NIH), KyotoEncyclopedia of Genes and Genomes (KEGG) and Reactome. FET was used todetermine the significance of the enrichment for a given pathway. Thetest evaluated the enrichment of genes in a pathway against a commonbackground consisting of all the genes in the data set. From the nulldistribution of p-values, FDR was computed using the Benjamini-Hochbergprocedure. A gene-set was considered to be significantly enriched whenFET p<1.0×10⁻³ and FDR<0.01. The results revealed a modular networkinvolving neuronal synapse regulation, neuron differentiation, signalingcomplexes and synaptic vesicles (FIG. 15C, Table 9). These findings,which were generated without prior knowledge of ASD disease genes,overlap with results obtained using other datasets derived from knownASD genes.

These 1,744 genes represent high priority candidates to besusceptibility loci for ASD and/or related disorders. Of these, 518(29.7%) genes have already been implicated as candidate genes forneuropsychiatric diseases (Table 7). It was therefore examined whetherthebrain-specific critical exon index might help to differentiate thespecific etiological gene(s) from mere bystanders co-located withingenomic CNV loci that are well documented to be involved in ASD. At the16p11.2 CNV (˜600 kb encompassing 29 genes) four genes (PRRT2, SEZ6L2,ASPHD1, KCTD13) were identified harboring critical exons (FIG. 19);three of these have been implicated within the smallest region ofoverlap identified by atypical cases with smaller deletions (FIG. 19).All four genes are implicated through functional and large phenotypicanalyses to be relevant to ASD or other neuropsychiatric phenotypes. Thepresent analysis identified other candidate genes for neuronalphenotypes from disease loci: 3q29 micro-duplication/deletion (PIGZ,DLG1), 17q21.31 micro-deletion syndrome (CRHR1, MAPT), 17q11.2-NF1micro-deletion syndrome (NF1, RAB11FIP4) and 22q11.2micro-duplication/deletion syndrome (ARVCF, DGCR8, ZDHHC8, PI4KA) (FIGS.20 and 21).

The development of exon transcriptome/mutation contingency indexprovides a new approach to prioritize putative mutations for subsequentclinical- and functional-genetic analyses. Applying this strategypermits unprecedented specificity and clarity to re-analysis of mutationdata, helping to redefine the genes, sites within genes, and molecularpathways involved in the disease processes. These studies of ASD haveyielded the most significant data so far, most likely because of thegreater maturity of datasets available for this disorder. From anevolutionary perspective, there seems to be selective pressure forspecific neuronal genes (the data suggests at least 1,744 of them) tomaintain the integrity of particular exons (3,955). The data indicatesthat as mutations are identified and proven, through functional studies,to be etiologic for brain disorders—e.g. for ASD, they will be found tohave arisen in genes having the properties identified here. Moreover,the impact of upstream mutations that lead to some form of alteration ofthe ‘critical exon’, as well as more subtle mutations, with direct andindirect effects of modifier genes can also now be modeled within thisframework. Ultimately, exploiting this new understanding of the role ofthe brain transcriptome in modulating mutational impact will facilitatemore accurate genotype and phenotype studies in ASD and relateddisorders.

EXAMPLE 2

Copy number variants (CNVs) are associated with numerous neurocognitivedisorders. In a diagnostic setting, clinical microarray commonly usesISCA 4X180K arrays to detect such chromosomal abnormalities. Thevariants are classified primarily into three distinct groups:“pathogenic”, “variant of unknown significance (VOUS)”, or“benign/control”. However, “pathogenic” events are typically large, andthe underlying causative genes are unclear. The genotype-phenotypecorrelation is also complicated for VOUS variants due to lack ofadditional support to interpret the variant.

Using 7,106 samples from individuals with developmental delay (obtainedfrom the Hospital for Sick Children, Toronto, molecular diagnosticlaboratory), the samples were genotyped using the InternationalStandards for Cytogenomic Arrays (ISCA). CNVs were primarily classifiedinto the three groups mentioned above. 7,646 control CNVs detected fromother high-resolution microarray experiments were also utilized.

The data were analyzed as described in Example 1 using themutation-transcriptome contingency index method for ‘critical exon’enrichment. Using this method, 226,845 exons of the genome wereprocessed. For each exon, the rare missense/LOF mutation burden wascomputed, and transcriptome was quantified for each exon based onprenatal brain expression data from Brain Span database. In this cohort,the “pathogenic”, “VOUS” and “control” loss variants impacted 61,000,12,147, and 11,046 exons of the genome, respectively. The“critical-exon” enrichment analysis revealed strikingly significant (FETtest p-value) differences between pathogenic, VOUS and controls (FIG.22). The proportion of exons classified as ‘critical-exon’ withinpathogenic and VOUS groups was within ˜17-19%, whereas the controls wereclassified only for ˜7-9%. These findings are consistent with previousresults.

EXAMPLE 3

Clinical Microarray Datasets—Clinical microarray (CMA) data was obtainedfrom two independent sites, The Hospital for Sick Children (SK) andCredit Valley Hospital (CVH). A total of 7,106 and 3,513 cases CMA datawere obtained, respectively, that went through confirmed diagnosis fordevelopmental delay. ISCA 180K comparative genomic hybridization arraywas used to detect large CNVs by applying a circular binary segmentationalgorithm. For reference, a pool of 10 samples was used to compareindividual probe intensities. The clinical annotation for each samplevariant was conducted by the clinical geneticist in each site.

DNA extracted from peripheral blood was used to perform comparativegenomic hybridization array (aCGH) analysis on a custom designed 4×180Koligonucleotide microarray platform (Oxford Gene Technology (OGT),Oxford, UK). Microarray experiments were performed according to themanufacturer's instructions. Briefly, DNA from the proband and pooledsame-sex reference DNA (Promega, Madison, WI) were labeled with Cy3-dCTPand Cy5-dCTP, respectively, and were hybridized to the array slideaccording to the manufacturer's protocol (OGT). The arrays were scannedusing the Agilent G2505Bmicroarray scanner. Data analysis was performedusing the Agilent Feature Extraction software (10.7.11) and CytoSureInterpret Software version 3.4.3 (OGT). Clinical interpretation of copynumber variants was consistent with the ACMG guidelines. Parentalfollow-up studies were performed by FISH analysis on culturedlymphocytes using standard protocols. Metaphase chromosomes werecounter-stained with DAPI, and inverted grey scale imaging was used tovisualize chromosome banding patterns for chromosome identification,using the ISIS Metasystems imaging software version 5.5.4 (Newton,Mass., USA). Parental follow-up of deletions less than 200 kb andduplications less than 700 kb were performed by aCGH.

9,692 unrelated control samples were used from multiple major populationscale studies that used high-resolution microarray platform. Thesesamples did not have any obvious psychiatric history. The studiesinclude 4,347 control samples assayed in Illumina 1M from the Study ofAddiction Genetics and Environment (SAGE) and the Health, Aging, andBody Composition (HABC); 2,988 control samples assayed in Illumina Omni2.5M from COGEND and KORA projects; 2,357 control samples assayed inAffymetrix 6.0 from Ottawa Heart Institute (OHI) and PopGen project. Anadditional 11,255 control datasets assayed in Illumina platforms fromARIC and WTCC2 project were also incorporated.

Critical Exon Classification—For critical exon classification asdescribed in Example 1, the 1000 genome project for rare missense lossof function (LOF) mutation burden computation and transcriptome datafrom the human developmental brain atlas was used.

Burden of rare missense mutations—Data from the 1000 genome projectinitiated by the US National Health Heart, Lung and Blood Institute(NHLBI) was used to calculate the burden of rare missense mutations inhuman populations, including 1,039 whole genome sequencing samples (495males, and 544 females). Within these whole genome sequenced (WGS)samples, exonic regions had mean coverage of at least 20×. The RefSeqgene annotation model (which includes all exons from annotated isoforms)was used for the analysis. Genes with no variant calls were excluded. Asdescribed previously, variants were annotated using Annovar andconsidered rare missense and loss of function (LOF) variants as strongproxy for recent (mostly within the last 5,000-10,000 years) raredeleterious mutation events in humans.

Spatio-temporal Human Brain Expression—The normalized RNA-seq expressionprofiles of spatio-temporal developmental human brains were downloadedfrom the BrainSpan database(http://www.brainspan.org/static/download.html). 394 tissue samples from31 post mortem donors (prenatal and adult) were analyzed. The expressionmeasures were provided for exons as reads per kilobase (kb) per million(RPKM) from mapped reads. Method details for sequencing, alignment, QCand expression quantification can be found in the BrainSpan TechnicalWhite Paper (http://www.brainspan.org/). The spatial-temporal (prenataland adult) analysis was conducted on 16 brain regions, including 11neocortex regions (V1C, primary visual cortex; STC, posterior (caudal)superior temporal cortex; IPC, posterior inferior parietal cortex; A1C,primary auditory cortex; S1C, primary somatosensory cortex; M1C, primarymotor cortex; DFC, dorsolateral prefrontal cortex; MFC, medialprefrontal cortex; VFC, ventrolateral prefrontal cortex; OFC, orbitalfrontal cortex; ITC, inferolateral temporal cortex) and AMY, amygdaloidcomplex; CBC, cerebellar cortex; HIP, hippocampus; MD, mediodorsalnucleus of thalamus; and STR, striatum. To classify critical exons, the75^(th) percentile value was computed from the entire dataset and usedas a threshold to define exons with high and low expression, accordingto steps 102, 103 in FIG. 30A. Critical exon fraction was computed for agene or a group of genes (impacted by CNVs) by applying the 75^(th)percentile index on all exons. The fraction was computed by dividing thenumber of exons classified as critical exon over total number of exons.

Human Developmental Protein Expression Data—The protein expressionlevels for the genes were analyzed using high-resolution genome-wideFourier-transform mass spectrometry data (as described in Kim et al.Nature 509, 575-81 (2014) and downloaded from Human Proteome Map). Anin-depth proteomic profiling of 30 histologically normal human sampleswas conducted, including 17 adult tissues (lung, heart, liver, gallbladder, adrenal gland, kidney, urinary bladder, prostate, testis,ovary, rectum, colon, pancreas, oesophagus, retina, frontal cortex, andspinal cord) and 7 fetal tissues (liver, heart, brain, placenta, gut,ovary, testis). High-resolution Fourier transform mass spectrometers wasused for fragmentation (high-high mode) to process the data. The dataresulted in the identification of proteins encoded by 17,294 genesaccounting for approximately 84% of the total annotated protein-codinggenes in humans. Average spectral counts per gene per sample were usedas the measure for protein expression.

WGCNA Network details—Weighted gene co-expression network analysis(WGCNA) analysis was conducted by computer using human proteinexpression data in development, in accordance with step 102 of FIG. 30A.The R WGCNA package was used to conduct the analysis (as described inLangfelder et al. BMC Bioinformatics 9, 559 (2008); and Langfelder etal. J Stat Softly 46 (2012)). The use of weighted networks represents animprovement over unweighted networks because it preserves the continuousnature of the co-expression information and it is biologically robustwith respect to parameter β (as described in Zhang et al. Stat ApplGenet Mol Biol 4, Article17 (2005)). Proteins that are not expressed(expression=0) in at least 90% of the samples were excluded because suchlow-expressed features tend to reflect noise and correlations based oncounts that are generally zero and thus not meaningful. The absolutevalue of the Pearson correlation coefficient is calculated for allpair-wise comparisons of protein expression values across alldevelopmental tissue samples into a similarity matrix. Blockwise networkconstruction and module detection method were used where the clusteringof a block consisted of a maximum of 20,000 proteins. A signed adjacencymatrix was constructed using a “soft” power adjacency functiona_(ij)=|0.5+0.5* cor(x_(i), x_(j))|^(β) where the absolute value of thePearson correlation measures the protein co-expression similarity, anda_(ij) represents the resulting adjacency that measures the connectionstrengths. The soft thresholding beta=18 was chosen based on thescale-free topology criterion β for the analysis. Next, to computemodules, where the proteins have high “topological overlap”, connectionstrength between proteins in the network was compared. The parametersfor module detection used—were: minimum 30 proteins per module and amedium sensitivity deepsplit=2 was applied to cluster splitting. Theclustering of genes for modules used average linkage hierarchicalclustering and modules were identified in the resulting dendrogram bythe dynamic hybrid tree cut. Found modules were trimmed of genes whosecorrelation with module eigengene (KME) is less than a threshold definedby the function minKMEtoStay and for merging similar modules, 0.35 wasused as a threshold. The connectivity of each node i is the sum ofconnections to other nodes.

For visualizing the protein co-expression network, Cytoscape networksoftware v.2.8.3 was used. The nodes were represented by a circle andthe edge between the nodes implies the co-expression weighted Pearsondistance. The color of the node is representative of their membership toa phenotype.

Significant Test Analysis and Permutation Test—Fisher's exact test (FET)was used for all count data and gene enrichment test with p-value <0.05(after Bonferroni multiple test correction) was used as the thresholdfor significance. To reveal the strength of enrichment association withthe gene lists, a permutation test was conducted by randomly drawingequal numbers of genes and re-analyzing the data under thenull-hypothesis. The random draw was conducted from a background that isappropriate for the test. With sufficient iterations (100,000 times),the resulting sets of p-values were presumed to be a reasonableapproximation of the null distribution of the p-values.

Reverse Transcription Polymerase Chain Reaction (RT-PCR) andquantitative RT-PCR (qRT-PCR)—For the quantification of ‘critical exons’by qRT-PCR, primers were designed to prime from within the specific exon(Table 9). The primers were tested for their PCR efficiency by dilutionstandard curve and for specificity with melting curve analysis usingadult whole brain cDNA. To quantify the ‘critical exon’ expression fromselected genes, RNA was used from a panel of 11 human tissues: liver (BDBiosciences kidney (Stratagene), mammary gland (BD Biosciences),cerebellum (Clonetech), skeletal muscle (Stratagene), prostate(Clonetech), spleen (Stratagene), thyroid (Stratagene) and testis(Clonetech). Reverse transcription was performed using the SuperscriptIII First strand Synthesis Supermix (Invitrogen). Using 10 ng of cDNA astemplate, RT-PCR was performed under standard PCR conditionsusingBrilliant III SYBR® Green PCR Master Mix (Agilent) and the MX300software (Agilent). Gene expression was normalized using MED13 or ACTB(dCt) and quantified as relative expression (2{circumflex over( )}(−dCt)).

TABLE 9 Primer genes sequences for quantification of critical exonswithin MVB12B, PPP1R9A, and GIT1 Primer Name Sequence (5′-3′) MVB12B-FTTC ATC CCA ATT CAG GAG AC (SEQ ID NO 25) MVB12B-RCAT GAT CCG AAT GTC ACA AA (SEQ ID NO 26) PPP1R9A-FAGC AGG TTT CTC ACT GGT TA (SEQ ID NO: 27) PPP1R9A-RGAT GCT GTC ATT CCA AGA GC (SEQ ID NO: 28) GIT1-FGCC TTG ACT TAT CCG AAT TG (SEQ ID NO 29) GIT1-RACC TCG TCA TAC ACG TCC A (SEQ ID NO: 30) ACTB-FATT GCC GAC AGG ATG CAG A (SEQ ID NO 31) ACTB-RGAG TAC TTG CGC TCA GGA GGA (SEQ ID NO 32) MED13-FCCG CAT CCT GAT GTG TCT GA (SEQ ID NO: 23) MED13-RTTG CAG GTG GAT ACG TGA CT (SEQ ID NO: 24)

Results

10,619 cases were analyzed from two hospitals (The Hospital for SickChildren (SickKids) and Credit Valley Hospital (CVH), Ontario) referredfor clinical laboratory testing due to developmental delay. Also, 9,692controls where no known psychiatric condition has been reported for thesamples were assayed (FIG. 23). From this developmental delay cohort,10.15% of the samples carried a pathogenic CNV, and 50.25% had at leastone rare VOUS (variant of unknown significance) (FIG. 24A). In theSickKids subset (7,106 cases), which included inheritance information,there were 169 de novo variants from developmental delay cases (108deletions and 61 duplications), of which 64% (108/169) were interpretedas pathogenic and 36% (61/169) as VOUS. The analysis was restricted tovariants of 30 Kb to 5Mb in length. The developmental delay cohort was68.53% male and 31.46% female. There were pathogenic deletion variantsin 5.26% of females and 3.69% of males, and this difference was highlysignificant (p<0.0002) (FIG. 24B). This gender difference was not foundfor duplication CNVs. The average number of genes with exonic variantsdiffered significantly between pathogenic CNVs and VOUS. Genes pervariant averaged 38 for pathogenic duplications, 27 for pathogenicdeletions, 4.6 for VOUS duplications, and 3.2 for VOUS deletions. Theaverage number of genes in pathogenic deletions from females was higherthan in those from males. CNVs at loci for known genomic disorders weremore frequent in the developmental delay cohort relative to controls,for example 16p11.2 (0.86%), 15q13.3 (0.43%), Prader-Willi syndrome(0.52%) and 22q11 deletion/duplication syndrome (1.06%).

Genes within pathogenic or VOUS CNVs in this developmental delay cohort,had a significantly higher fraction of critical exons (computed over allexons impacted by CNVs), compared with genes in rare duplications ordeletions in controls. Gene sets from pathogenic CNVs and VOUS were verylarge, and overlapped those from control CNVs and so the analysis waslimited to genes impacted by pathogenic CNVs or VOUS from the casecohort, and not found in unaffected controls (FIG. 25). A strikingenrichment of critical exons (computed using prenatal braintranscriptome) within pathogenic deletions (corrected Fisher's ExactTest (FET within 16 brain regions ranged from P<1.64×10⁻¹⁵ to1.31×10⁻²⁹⁹) and VOUS deletions (P<1.31×10⁻²⁰ to 6.22×10⁻¹⁵⁸) wasobserved (FIG. 25A). This result strongly suggests that pathogenic andVOUS variants harbor more critical exons than the rare CNVs in controls.Although deletions showed the consistently highest sensitivity, asimilar increased critical exon fraction for pathogenic and VOUSduplications computed using prenatal and adult transcriptome wasobserved (FIG. 25A). Exclusively pathogenic and VOUS gene sets were thenused to identify candidates for effects on developmental disorders. Ofinterest, the fraction of critical exons was highest in prenatalneocortical regions (specifically medial prefrontal cortex (MFC)tissues) for genes impacted by either pathogenic or VOUS deletions. Thisobservation strongly supports previous independent reports of multipleneuropsychiatric patients with altered MFC functional activity.

To examine the biological relevance of pathogenic and VOUS ‘criticalexons’ at the protein level, a co-expression analysis was applied usingFourier transformed protein expression data. A draft map of the humanproteome reported by Kim et al. (Nature 509, 575-581) obtained from massspectrometry of 24 different human tissues (each pooled from 3post-mortem samples) including 17 adult and 7 prenatal, was used. Thisis the first such human developmental protein co-expression analysis.Unlike the biased seeding approach used to construct modules andnetworks, an unbiased construction of networks was implemented, based onspatio-temporal protein expression, by applying a weighted genecoexpression network analysis (WGCNA). WGCNA was applied to the 17,294genes represented by the proteome (Jiang et al. American journal ofhuman genetics, (2013); published online EpubJul 10(10.1016/j.ajhg.2013.06.012) to construct protein co-expressionnetworks. Genes not expressed in at least 90% of the tissues wereexcluded. The analysis revealed networks for 23 independent proteinexpression modules (FIG. 26A). To identify modules that are relevant todevelopmental delay, gene enrichment analysis was conducted using 18,826gene ontology (GO) terms, and retained the 20 most significant GO terms.From this analysis, the ‘blue’ module was identified (FIG. 26A), whichcomprised 2,484 genes, and was highly significant (P<1.0×10⁻³⁹ to1.0×10⁻⁸¹) for pathways involved in synaptic transmission, neuronprojection, cell signaling, nervous system development and axon guidance(FIG. 26B).

To develop a list of candidate genes related to developmental delay, thetwo approaches, critical exon and WGCNA core protein, were combined,analyzing genes from the protein-derived ‘blue’ module for critical exonenrichment. For each gene in the genome, the critical exons for prenataland for adult tissues were computed. To control for the genes with alarge number of exons, for each gene in the genome, the fraction ofcritical exons (over all exons) for a gene in each brain region werecomputed. For each developmental period (prenatal or adult), a gene wasdeemed to be significant if its critical exon fraction fell within thegenome' s top 25^(th) percentile for at least 50% of the brain samples(Table 10).

TABLE 10 Clinically relevant genes within known syndromic regions. BlueProtein Network and Critical Exon Total Enriched Syndrome CoordinateGene Genes Genes 16p11.2 chr16: 30 DOC2A, ASPHD1, DOC2A, micro-29606852- LOC440356, TAOK2, duplication 30199855 CORO1A, TBX6, PRRT2,syndrome LOC100271831, SEZ6L1, PRRT2, CDIPT, MAPK3, QPRT, YPEL3, ALDOASLC7A5P1, PPP4C, MAPK3, SPN, MVP, FAM57B, ZG16, ALDOA, INO80E, SEZ6L2,TAOK2, KCTD13, MAZ, KIF22, GDPD3, C16orf92, C16orf53, TMEM219, C16orf54,HIRIP3 Angelman chr15: 116 NIPA2, NIPA1, UBE3A, syndrome 23619912-SNORD116-9, GABRB3, (Type 1/2) 28438266 SNORD116-8, CYFIP1 SNORD116-5,SNORD116-4, SNORD116-7, SNORD116-6, SNORD116-1, SNORD116-3, SNORD1162,SNORD109A, SNORD109B, GOLGA8IP, PARSN, PWRN1, PWRN2, OCA2, LOC100128714,MIR4508, SNORD115-34, PAR5, PAR4, IPW, PAR1, GOLGA8E, SNORD116--19,SNORD11618, GABRG3, SNORD115-3, SNORD115-31, SNORD64, SNORD115-18,SNORD115-19, SNORD115-14, SNORD115-15, SNORD115-16, SNORD115-17,SNORD115-10, SNORD115-11, SNORD115-12, SNORD115-13, UBE3A, MAGEL2,ATP10A, C15orf2, SNORD115-21, LOC283683, SNORD115-23, SNORD115-22,SNORD115-25, SNORD115-8, SNORD107, SNORD115-26, SNORD115-29, SNORD108,SNORD116-16, SNORD115-20, HERC2, SNORD115-24, SNORD115-27, SNORD115-36,SNORD115-37, MKRN3, SNORD115-35, SNORD115-32, SNORD115-33, SNORD115-30,SNORD115-28, GABRA5, MIR4715, SNORD115-38, SNORD115-39, HERC2P2,HERC2P7, NDN, LOC503519, GABRB3, SNORD115-43, SNORD115-42, SNORD115-41,SNORD115-40, SNORD115-47, SNORD115-45, SNORD115-44, SNORD115-48,TUBGCP5, CYFIP1, SNORD116-24, SNORD116-25, SNORD116-26, SNORD116-27,SNORD116-20, SNORD116-21, SNORD116-22, SNORD116-23, SNORD116-28,SNORD116-29, SNORD115-6, SNORD115-7, SNORD115-4, SNORD115-5, SNORD115-2,SNURF, SNORD115-1, SNORD116-11, SNORD116-10, SNORD116-13, SNORD116-12,SNORD116-15, SNORD116-14, SNORD116-17, SNORD115-9, SNRPN, WHAMMP3,LOC653061 Williams- chr7: 26 STX1A, WBSCR27, CLIP2, Beuren 72744455-WBSCR22, LAT2, LIMK1 Syndrome 74142672 LIMK1, WBSCR28, (WBS) MIR4284,RFC2, and 7q11.23 FKBP6, MIR590, duplication FZD9, VPS37D, syndromeABHD11, CLIP2, CLDN3, CLDN4, BCL7B, ELN, MLXIPL, DNAJC30, GTF2IRD1,BAZ1B, TBL2, EIF4H, GTF2I, ABHD11-AS1 22q11 chr22: 62 P2RX6P, RIMBP3,CLDN5, Velo- 19009792- TMEM191A, CLTCL1, cardio- 21452445 PI4KA, KLHL22,SEPT5 ^(Ψ), facial/ SLC7A4, PI4K2 DiGeorge LOC388849, syndrome MIR185,GNB1L, TBX1, MIR3618, MIR1306, SEPT5, ZNF74, P2RX6, DGCR8, PI4KAP1,DGCR10, TMEM191B, DGCR2, GP1BB, LOC400891, C22orf39, C22orf25, DGCR6L,MED15, CRKL, TXNRD2, CLDN5, LOC150197, RTN4R, TSSKRKL, TXNRD2, CLDN5,LOC150197, RTN4R, TSSK2, GSC2, ARVCF, SLC25A1, MIR4761, COMT, LOC284865,LOC729444, AIFM3, CLTCL1, SERPIND1, THAP7- AS1, SCARF2, HIRA, THAP7,MIR1286, THAP7- AS1, SCARF2, HIRA, THAP7, MIR1286, RANBP1, POM121L4P,SNAP29, UFD1L, DGCR11, C22orf29, MRPL40, DGCR14, ZDHHC8, CDC45, TRMT2A,LZTR1, LOC150185, MGC16703, SEPT5- GP1BB 3q29 micro- chr3: 28 RNF168,NCBP2, PAK2^(Ψ) deletion/ 195726835- LOC100507086, duplication 197344663ZDHHC19, syndrome DLG1, TM4SF19- TCTEX1D2, TFRC, LOC152217, UBXN7,FBXO45, MIR4797, MF12, SENP5, OSTalpha, TCTEX1D2, PIGX, PIGZ, LOC220729,BDH1, PCYT1A, WDR53, LRRC33, MF12-AS1, C3orf43, LOC401109, TM4SF19,CEP19, PAK2 ^(Ψ)-deletrious point mutations or focal deletions have beenindependently reported in cases with developmental delay or relatedconditions.

48.5% (1,206/2,484) of ‘blue’ module proteins to be within the top25^(th) percentile of genes enriched for critical exons, both forprenatal (FET, P<1.15×10⁻⁵⁰, OR=2.11) and adult brain (FET,P<6.03×10⁻¹⁸, OR=1.55). This included genes (SCN2A(MIM: 182390),NRXN1(MIM: 600565), NRXN2 (MIM: 600566), NRXN3(MIM: 600567), SHANK2(MIM: 603290), NLGN3(MIM: 300336), STXBP1(MIM: 602926), NLGN4(MIM:300427)) known to be associated with neuropsychiatric conditions.Inference of these 1,206 candidate genes, as described, was independentof the knowledge of gene-disease association. The overrepresentation ofcritical exon genes within the ‘blue’ module was tested by randompermutation 100,000 times to obtain an empirical significance (for bothperiods, P<1.0×10⁻⁰⁵) (FIG. 26C-D). The blue module was then tested fromwithin for overrepresentation in tiers of genes that are associated withcertain neurodevelopmental disorders. There was significant enrichmentof ‘blue’ module genes with 840 fragile X mental retardation protein(FMRP) target genes (P<1.36×10⁻¹²⁸, OR=6.25; empirical P<1.0×10⁻⁰⁵), 246genes impacted with de novo LOF mutations in ASD cases (P<5.9×10⁻⁰⁴,OR=1.79; empirical P<2.1×10⁻⁰⁴) and 141 genes that had de novo exonicdeleterious mutations in intellectual disability samples (P<8.0×10⁻⁰³;OR=1.80; empirical P<2.3×10⁻⁰³) from exome and whole genome sequencing.This result suggested that proteins in this ‘blue’ module are underpurifying selection in brain, and that perturbing its components maylead to neurodevelopmental phenotypic manifestations.

Next, ‘blue’ module genes were searched for among those ascertained fromCNVs in the developmental delay cohort. “Blue” module genes weresignificantly overrepresented among genes within pathogenic deletions(P<2.97×10⁻⁰⁹; OR=1.58; empirical P<0.024) or duplications (P<0.002;OR=1.24; empirical P<0.006) (1 sided FET and permutation test). Theywere similarly overrepresented among genes within VOUS deletions(P<7.44×10⁻⁰⁷; OR=1.48; empirical P<0.002) and duplications (P<0.004;OR=1.14; empirical P<0.023) (FIG. 26E, F). In contrast, the genes withindeletions and duplications in controls were not overrepresented withinthe “blue” module. 438 candidate genes (of 1,206 genes) were found thatwere included in the ‘blue’ module and also highly enriched for criticalexons (top 25^(th) percentile), that had been ascertained by at leastone pathogenic variant or VOUS in the developmental delay cohort. Ofthese, 65 (14.84%) were seen in at least 2 VOUS deletions in cases butnot controls. Another 37 of these 438 genes were impacted by at leastone de novo VOUS in the dataset. Genes from large recurrent CNVs knownto be associated with neuropsychiatric syndromes (e.g., 16p11.2, 22q11,and 3q29) were represented in the ‘blue’ protein module and were highlyenriched for ‘critical exons’ in prenatal or adult brain (Table 10).

Through unbiased critical exon analysis coupled with WGCNA networkanalysis, 1,206 candidate genes were identified whose disruption islikely to contribute to neurodevelopmental delay. For example, VOUSdeletions and duplications in the cohort within chromosome region17q11.2 affect the gene for G protein-coupled receptor kinaseinteracting ArfGAP1 (GIT1) (MIM: 608434). Within the VOUS in thisregion, GIT1 was the only gene enriched with critical exons in prenatalbrain, and was highly clustered with genes (highly connected firstdegree neighbors) from within the ‘blue’ protein module network thatwere reported to have de novo mutations in ASD. The GIT1 protein isinvolved in cell migration, localizes in pre- and post-synapticterminals, and regulates synapse formation. Recent studies of knock-outGit1−/− mice showed a decreased brain size, with impaired motorcoordination and deficits in learning and memory. Follow-up analysis onadditional human cohorts (Mayo clinic, DatabasE of genomiC variation andPhenotype in Humans using Ensembl Resources (DECIPHER) and CentreHospitalier Universitaire Sainte-Justine clinic) found enrichment ofCNVs affecting GIT1 amoung cases (12 deletions and duplications,including 5 de novo; none in controls). The smallest focal deletion of180 kb was found in an 11-year-old child (FIG. 27A, Table 11, case 6D)referred for developmental delay with epilepsy and attention deficithyperactivity disorder as comorbid conditions. Another focal deletionwas found as a de novo event of 299 kb in a 10-year-old child referredfor developmental delay (case 1D-DN). The DECIPHER database showed asimilar de novo focal 282 kb deletion affecting a child referred forlearning disabilities, dysphasia and poor motor coordination (seephenotype Table 11 case 5D-DN). This source also identified two de novoduplications affecting GIT1: a 1.4 Mb de novo duplication in a patientreferred for intellectual disability, and a focal 466 kb duplication ina patient with broader developmental delay. A de novo damaging missensemutation was also reported in a schizophrenia case. Quantitativereal-time PCR (rt-PCR) analysis was conducted to quantify relative mRNAexpression analysis of GIT1 on 11 human tissues (including prenatal andadult brain) by targeting a critical exon of the gene. GIT1 expressionrelative to that of the MED13(MIM: 603808) gene (or to the ACTB(MIM:102630) gene) showed brain specific expression in prenatal and adulttissue (FIG. 27C).

TABLE 11 Phenotypic table for cases with developmental delay and CNVsimpacting GIT1, PPP1R9A, and MVB12B gene. The cases are listed only ifthe phenotypic information was available. Critical Other Case Exon CopyCNV Age of Developmental Dysmorphic Clinical ID Gene Size Number(Inheritance) Ascertainment Delay/ID Features Features 1D_DN GIT1 299Kb  Loss 17q11.2 10 yrs Developmental N/A 27.822 to 28.121 Mb delay (denovo) 6D GIT1 180 Kb  Loss 17q11.2 11 yrs Developmental Epilepsy, 27.733to 27.913 Mb delay ADHD 5D_DN GIT1 282 Kb  Loss 17q11.2 N/A LearningDysphasia 27.837 to 28.120 Mb disability (de novo) 4D_DN GIT1 5.3 MbLoss 17q11.2q12 1 yr N/A N/A 27.771 to 33.094 Mb (de novo) 7D_DN GIT13.5 Mb Loss 17q11.2 3 months — — Prematurity, 27.274 to 30.817 Mbtetralogy of (de novo) Fallot, bilateral choroid plexus cyst,imperforate anus, ambiguous genitalia 2D GIT1 3.1 Mb Loss 27.869 to31.043 Mb N/A Developmental N/A N/A delay and/or ID 3D GIT1 2.1 Mb Loss27.606 to 29.722 Mb N/A Developmental N/A N/A delay and/or ID 3G_DN GIT1466 kb   Gain 17q11.2 <1 yr N/A N/A 27.696 to 28.162 Mb (de novo) 5GGIT1 9.2 Mb Gain 17p11.2q12 14 yr ID Autism, 20.649 to 29.832 obesity11D_DN PPP1R9A 703 Kb  Loss 7q21.3 3 yrs Speech delay — Repetitive94.823 to 95.024 Mb behaviors (de novo) and sensory sensitivitiesconsistent with autism spectrum disorder 7D PPP1R9A   2 Mb Loss 7q21.312 yrs — Triangular Myoclonus 92.943 to 94.931 Mb facies, broaddystonia, forehead, short stature, thin lips. failure to thrive,anxiety, obsessive- compulsive behavior. 13D_DN PPP1R9A 4.8 Mb Loss7q21.2q21.3 4 yrs Fine and gross Microcephaly Short 92.388 to 97.197 Mbmotor delay, stature, (de novo) speech delay, ADHD, ID hypotonia, autism12D PPP1R9A 2.1 Mb Loss 7q21.3 21 months — Ear pit, helix Hyperplasia92.992 Mb to 95.058 of right leg; Mb Father has (Paternal) tremors dueto SGCE deletion. 9D PPP1R9A 1.3 Mb Loss 7q21.3 2 yrs — Short 94.909 to96.189 Mb stature, sensorineural hearing loss, congenital hipdislocation 4D PPP1R9A 5.9 Mb Loss 7q21.2q21.3 N/A ID Micrognathia Short91.997 to 97.905 Mb stature, congenital hip dislocation, short stature,sensorineural hearing loss 3D_DN PPP1R9A 6.8 Mb Loss 7q21.3q22.1 <1 yrID, speech Epicanthus, Ectrodactyly 94.174 to 10.101 Mb delayposteriorly (de novo) rotated ears 1D PPP1R9A 8.8 Mb Loss 93.184 to102.043 N/A Developmental N/A N/A Mb delay and/or ID 5D PPP1R9A 5.8 MbLoss 93.891 to 99.735 Mb N/A Developmental N/A N/A delay and/or ID 6DPPP1R9A 5.8 Mb Loss 89.836 to 95.635 Mb N/A Developmental N/A N/A delayand/or ID 10D PPP1R9A 1.1 Mb Loss 93.926 to 95.027 Mb N/A DevelopmentalN/A N/A delay and/or ID 5G_DN MVB12B 839 Kb  Gain 9q33.3 15 yrs LearningTourette 128.653 to 129.491 disability syndrome, Mb ADHD (de novo) 4G_DNMVB12B 471 Kb, Gains 9q33.3 N/A ID Unaffected 703 Kb 128.772 to 129.243niece has Mb 9q34.11 9q33.3 gain. 130.898 to 131.601 Mb (both de novo)6G_DN MVB12B 1.9 Mb Gains 9q33q34.11, 2 yrs N/A N/A 7.5 Mb 128.432 to130.352 Mb 9q34.11q34.3 130.825 to 138.309 Mb (both de novo) 1G MVB12B683 Kb  Gain 129.218 to 129.902 N/A Developmental N/A N/A Mb delayand/or ID 5D_DN MVB12B 330 Kb  Loss 9q33.3 N/A Global — 129.156 to129.490 Developmental Mb Delay (de novo) 7D_DN MVB12B 980 Kb  Loss9q33.3 5 yrs Developmental — Patellar 129.136 to 130.120 delay aplasiaMb (de novo) 6D_DN MVB12B 470 Kb, Loss 9q33.3 N/A Developmental — Autism700 Kb 128.772 to 129.243 Delay, ID Mb 9q34.11 130.898 to 131.601 (Bothde novo) 3D_DN MVB12B 1.2 Mb Loss 9q33.3 <1 yr N/A N/A 128.652 to129.871 Mb (de novo) 1D_DN MVB12B 2.8 Mb Loss 9q33.3q34.11 1 yr IDBrachycephaly, Feeding 128.460-131.260 microcephaly difficulties (denovo) at infancy 12D_DN MVB12B 3.6 Mb Loss 9q33.3q34.11 18 yrsDevelopmental — Seizures; 127.818-131.400 Mb delay deletion (de novo)includes STXBP1 2D_DN MVB12B 4.1 Mb Loss 9q33.3q34.11 1 yr N/A N/A128.870 to 132.995 Mb (de novo) 4D_DN MVB12B 4.1 Mb Loss 9q33.3q34.11 9yrs N/A N/A 127.213 to 131.263 Mb (de novo)

Another candidate gene was ascertained, multivesicular body subunit 12B(MVB12B), with enrichment of critical exons and also clustered withgenes in the ‘blue’ network that were reported to be impacted by de novoexonic mutations in individuals with neuropsychiatric conditions.Initially a single de novo duplication was observed in the cohort, butsubsequently 18 VOUS involving this gene were found, including 12 thatwere de novo in origin (8 deletions and 4 duplications) (FIG. 28). Thisincluded 3 recurrent de novo CNVs ascertained from developmental delaycases impacting only the MVB12B gene (FIG. 28 and Table 11 cases6D-DN,4G-DN and 3G). The protein encoded by this gene is an endosomalsorting complex required for transport (ESCRT-I) which is involved insorting of ubiquitinated cargo protein from the plasma membrane.Mutations in the ESCRT complex family have been implicated infrontotemporal dementia, a neurodegenerative disease. The data includeda de novo 839 kb duplication in a child (case 5G-DN) with Tourettesyndrome, attention deficit hyperactivity disorder and learningdisability. Another case had an adjacent de novo 700 kb duplication, andwas described as having severe intellectual disability, whereas anindividual (FIG. 28, case 3G) with a recurrent duplication was reportedto be normal, thereby demonstrating variable expression (or reducedpenetrance) of the duplication. Deletions were found in cases referredfor developmental delay and reported to have other comorbid conditions(Table 11). The 73 kb smaller transcript of this gene (NM_001011703.2)containing the ‘critical exons’ in the analysis was impacted by 12 ofthe reported de novo VOUS and no CNVs in controls. Quantitative PCRanalysis of 11 tissues for both MVB12B and PPP1R9A showed prenatal andadult brain-specific mRNA expression relative to that of the MED13 gene(or the ACTB gene).

The present data set showed enrichment of deletion variants (4 VOUS, 1de novo) containing another gene from the candidate list: the proteinphosphatase 1 regulatory subunit 9A (PPP1R9A) (MIM: 602468) gene.Deletions within PPP1R9A gene were identified in developmental disordercases and controls. The breakpoints of 13 VOUS deletions were found toimpact PPP1R9A and nearby genes. The breakpoints include 4 de novo VOUSreported from developmental delay cases. There was no deletion found inthe control dataset. The shortest de novo deletion was 201 Kbascertained from a case (11D_DN) with developmental delay in the presentcohort. This particular de novo also impacted the PON gene family whereexonic deletions were also present in controls. The human proteinco-expression network revealed PPP1R9A gene is the within the blueprotein module and enriched for ‘critical exons’ and putative ASD genesreported to have de novo mutations.

Additional data sets revealed 18 CNVs (13 deletions and 5 duplications),including 3 additional de novo deletions impacting exons of the gene.PPP1R9A is the only gene from within the de novo variants that isenriched with critical exons and clustered within the blue proteinmodule. The nearby genes are impacted by polymorphic deletions incontrols, whereas no deletions encompassing PPP1R9A were identified fromthe control data set. A de novo missense mutation of this gene wasrecently reported in an ASD proband. This gene shows tissue-specificimprinting; it is maternally expressed in skeletal muscle, but bothalleles are expressed in other embryonic tissues, including the brain.The protein encoded by this gene, Neurabin I, is a key candidatemolecule in synaptic formation and function. PPP1R9A also has a role insynaptic structure and function, spine motility and neurite formation.Enrichment of critical exons of this gene within adult brain regionswere found, and it is part of the ‘blue’ module. An individual withautism was reported to have a de novo missense variant of PPP1R9A and arare LOF mutation was reported in a schizophrenia case. This gene alsoshows increased expression in brain from individuals with bipolardisorder compared with controls. Upon further investigation of the‘blue’ protein co-expression network, highly connected neighboring(first degree) genes of PPP1R9A were reported to have de novo mutationsin individuals with neuropsychiatric conditions. In the present cohort,a focal de novo 201 kb deletion affecting this gene was found in a child(Table 11, case 11D-DN) referred for language delay, repetitivebehaviors and sensory sensitivities, consistent with her diagnosis ofASD. A 2 Mb deletion impacting 15 genes - including PPP 1R9A and thesarcoglycan epsilon (SGCE) (MIM: 604149) gene—was found in a 12 year oldgirl (case 7D) referred for myoclonus dystonia, short stature, failureto thrive, severe anxiety and obsessive-compulsive behaviour. She alsohad mild dysmorphic features (triangular facies, broad forehead, thinlips) but no cognitive concerns were reported.

The brain transcriptome and proteomic method implemented here can infercandidate genes from within the boundaries of a pathogenic or VOUS CNVthat are likely to impact brain-related conditions. Thus, the presentmethods represent a quantifiable approach to screen for genes that arecandidates to be involved in developmental disorders such asneurodevelopmental disorders, through the coordinated application ofmultiple genome-scale data sets. The critical exon approach reveals anegative selective pressure, whereas the protein expression analysisbrings out networks that are in pathways biologically relevant toneurodevelopmental disorders. Through the approach described in thisstudy, it is demonstrated that the combined use of different types ofmolecular data from the human brain may be used to interpret andidentify candidate genes for developmental disorders, from pathogenicvariants and VOUS. This approach may also be developed using comparabletissue-specific molecular tools to reveal candidates for other classesof disorders.

Relevant portions of references referred to herein are incorporated byreference.

1. A method of detecting a critical exon associated with a disease in anucleic acid sample from a mammal, comprising the steps of: i) detectingexons within target genes in the nucleic acid sample which exhibit anexpression level greater than the 75th percentile of exon expressionlevels in genes from one or more mammals having the disease usingnucleic acid primers that target exons within the target genes; ii)detecting rare mutations which occur at a frequency of less than 0.05 orde novo sequence mutations not possessed by either parent of the nucleicacid sample within each exon detected in step i) and determining theburden of mutation of each exon by dividing the number of mutations ineach exon by the length of the exon; and iii) identifying an exon as acritical exon when the burden of mutation of the exon is less than the75^(th) percentile of mutations in the exon and inversely correlateswith the exon expression level.
 2. The method of claim 1, wherein thedisease is a pathological condition in a developmental process of thecentral nervous system, brain, heart, kidney or lungs.
 3. The method ofclaim 1, wherein the disease involves a developmental process of thebrain.
 4. The method of claim 1, wherein the nucleic acid-containingsample is selected from neocortex, amygdala, cerebellar cortex,hippocampus, mediodorsal nucleus of the thalamus, and striatum.
 5. Themethod of claim 4, wherein the sample is a cerebellar cortex sample. 6.The method of claim 1, wherein the sample is a prenatal, child or adultsample.
 7. The method of claim 1, wherein the rare or de novo mutationsare selected from the group of missense, frameshift, nonsense,splice-site variants and copy number variations.
 8. The method of claim7, wherein the mutations are missense mutations.
 9. The method of claim1, additionally comprising conducting a gene co-expression analysisbefore step i) to detect in the sample genes that encode proteinsrelevant to the disease for analysis in step i).
 10. The method of claim9, wherein the co-expression analysis is a Weighted Gene Co-expressionNetwork Analysis.
 11. The method of claim 1, wherein the nucleic acidsample is obtained from one or more of the neocortex, amygdala (AMY),cerebellar cortex (CBC), hippocampus (HIP), mediodorsal nucleus of thethalamus (MD) and striatum (STR).
 12. The method of claim 11, whereinthe expression level of Fragile X mental retardation target (FMRP)genes, Forkhead box P2 (FOXP2) target genes, post-synaptic proteome(PSP) target genes and/or autism spectral disorder (ASD) target genes isdetermined in step i).
 13. A method of detecting a critical exonassociated with a disease which is a neuropsychiatric disorder selectedfrom Autism Spectral disorder (ASD), schizophrenia (SZ), intellectualdisability (ID), Fragile X syndrome, epilepsy and nervous disorders in anucleic acid sample from a mammal, comprising the steps of: i) detectingexons within target genes in the nucleic acid sample which exhibit anexpression level greater than the 75th percentile of exon expressionlevels in genes from one or more mammals having the disease usingnucleic acid primers that target exons within target genes selected fromthe group consisting of: (SEQ ID NO: 1) TGACATCGCACGTCCATCTG and(SEQ ID NO: 2) ACAGGTAGTAAATGGCCCAGG; (SEQ ID NO: 3)GAGCCTTCCTGATCCCCTAT and (SEQ ID NO: 4) ACTGTTTCCACGGCAGTGT;(SEQ ID NO: 1) TGACATCGCACGTCCATCTG and (SEQ ID NO: 4)ACTGTTTCCACGGCAGTGT; (SEQ ID NO: 5) CGGCTGCTGTGCTATCATTC and(SEQ ID NO: 6) CAGCCAACACCCTTCCAGAT; (SEQ ID NO: 3) GAGCCTTCCTGATCCCCTATand (SEQ ID NO: 6) CAGCCAACACCCTTCCAGAT; (SEQ ID NO: 7)GGCTGGATAAGCCAGGTCAG and (SEQ ID NO: 8) GGAAAGAGTTGTAGCTCCCGA;(SEQ ID NO: 9) CCTGTTCTTCCGTGGAGTGA and (SEQ ID NO: 10)CATGAAGCCCACGATGGAGA; (SEQ ID NO: 11) GGCTGGATAAGCCAGGTCAG and(SEQ ID NO: 12) TACTCATCCACCAGGGCTGT; (SEQ ID NO: 13)CCTCAACAGACACTGCCGTA and (SEQ ID NO: 14) GTCTCCCGCCGAGTAAGAAG;(SEQ ID NO: 15) CCTCATCCCCATGGTGACATC and (SEQ ID NO: 16)TTACTGCGGTTGTGCAGGAG; (SEQ ID NO: 17) GACTTTGTCTTCGCCCCAGA and(SEQ ID NO: 18) CTACCATCAAGCCGTCCCTG; (SEQ ID NO: 19)TCCGACATTCTCACTTGCCA and (SEQ ID NO: 20) CCAGGGTCAGCACAAGTCAT;(SEQ ID NO: 21) GGACTGTTCTCTGCTGGGAC and (SEQ ID NO: 22)GTTCGGAACCAGTACTCGCC; (SEQ ID NO: 23) CCGCATCCTGATGTGTCTGA and(SEQ ID NO: 24) TTGCAGGTGGATACGTGACT; (SEQ ID NO: 25)TTC ATC CCA ATT CAG GAG AC and (SEQ ID NO: 26)CAT GAT CCG AAT GTC ACA AA; (SEQ ID NO: 27) AGC AGG TTT CTC ACT GGT TAand (SEQ ED NO: 28) GAT GCT GTC ATT CCA AGA GC; (SEQ ED NO: 29)GCC TTG ACT TAT CCG AAT TG and (SEQ ID NO: 30)ACC TCG TCA TAC ACG TCC A; and (SEQ ID NO: 31) ATT GCC GAC AGG ATG CAG Aand (SEQ ED NO: 32) GAG TAC TTG CGC TCA GGA GGA.

ii) detecting rare mutations which occur at a frequency of less than0.05 or de novo sequence mutations not possessed by either parent of thenucleic acid sample within each exon detected in step i) and determiningthe burden of mutation of each exon by dividing the number of mutationsin each exon by the length of the exon; and iii) identifying an exon asa critical exon when the burden of mutation of the exon is less than the75^(th) percentile of mutations in the exon and inversely correlateswith the exon expression level.
 14. The method of claim 13, wherein thenucleic acid-containing sample is selected from neocortex, amygdala,cerebellar cortex, hippocampus, mediodorsal nucleus of the thalamus, andstriatum.
 15. The method of claim 13, wherein the sample is a prenatal,child or adult sample.
 16. The method of claim 13, wherein the rare orde novo mutations are selected from the group of missense, frameshift,nonsense, splice-site variants and copy number variations.
 17. Themethod of claim 13, additionally comprising conducting a geneco-expression analysis to identify genes encoding proteins relevant tothe disease for analysis in step i).
 18. The method of claim 17, whereinthe co-expression analysis is a Weighted Gene Co-expression NetworkAnalysis.
 19. A computer program product, comprising a tangiblecomputer-readable medium carrying computer-usable instructions which,when executed by a processing unit of a computer, cause the processingunit to identify one or more critical exons associated with a disease ina mammal according to the method defined in claim
 1. 20. A computersystem for use to identify one or more critical exons associated with adisease in a mammal, comprising: a memory for storing instructions; andat least one processing unit coupled to the memory for executing theinstructions stored in the memory, wherein the instructions, whenexecuted by the at least one processing unit, cause the computer systemto conduct the steps of the method defined in claim 1.